Check if a source is written in Python.
This function uses multiple indicators, prioritizing syntactic validity
via the Abstract Syntax Tree (AST) parser for maximum confidence.
Indicators used
- File extension check for path inputs (e.g., .py, .pyi, .pyx, .pyw).
- Shebang line detection (e.g., "#!/usr/bin/python").
- Definitive syntax check using Python's
ast.parse().
- Fallback heuristic via Pygments lexer guessing.
Note
The function is definitive for complete, syntactically correct code blocks.
It falls back to a Pygments heuristic only for short, incomplete, or
ambiguous code snippets that fail AST parsing.
Parameters:
-
source
(str | Path)
–
raw code string or Path to source file to check.
Returns:
-
bool ( bool
) –
True if the source is written in Python.
Source code in src/chunklet/code_chunker/utils.py
| @validate_input
def is_python_code(source: str | Path) -> bool:
"""
Check if a source is written in Python.
This function uses multiple indicators, prioritizing syntactic validity
via the Abstract Syntax Tree (AST) parser for maximum confidence.
Indicators used:
- File extension check for path inputs (e.g., .py, .pyi, .pyx, .pyw).
- Shebang line detection (e.g., "#!/usr/bin/python").
- Definitive syntax check using Python's `ast.parse()`.
- Fallback heuristic via Pygments lexer guessing.
Note:
The function is definitive for complete, syntactically correct code blocks.
It falls back to a Pygments heuristic only for short, incomplete, or
ambiguous code snippets that fail AST parsing.
Args:
source (str | Path): raw code string or Path to source file to check.
Returns:
bool: True if the source is written in Python.
"""
# Path-based check
if isinstance(source, Path) or (isinstance(source, str) and is_path_like(source)):
path = Path(source)
return path.suffix.lower() in {".py", ".pyi", ".pyx", ".pyw"}
if isinstance(source, str):
# Shebang line check
if re.match(r"#!/usr/bin/(env\s+)?python", source.strip()):
return True
# Definitive syntactic check (Highest confidence)
try:
ast.parse(source)
# If parsing succeeds, it's definitely Python code
return True
except Exception: # noqa: S110
# If fails, it might still be Python code (e.g., incomplete snippet), so continue with heuristics
pass
# Pygments heuristic (Lowest confidence, last resort)
try:
lexer = guess_lexer(source)
return lexer.name.lower() == "python"
except ClassNotFound:
return False
|