chunklet.code_chunker.helpers
Functions:
-
is_binary_file–Determine whether a file is binary or text.
-
is_python_code–Check if a source is written in Python.
is_binary_file
Determine whether a file is binary or text.
First tries to guess the file type based on its MIME type derived from
the file extension. If MIME type is unavailable or ambiguous, reads the
first 1024 bytes of the file and checks for null bytes (b' '), which
indicate binary content.
Parameters:
-
(file_pathstr | Path) –Path to the file.
Returns:
-
bool(bool) –True if the file is likely binary, False if text.
Source code in src/chunklet/code_chunker/helpers.py
is_python_code
Check if a source is written in Python.
This function uses multiple indicators, prioritizing syntactic validity via the Abstract Syntax Tree (AST) parser for maximum confidence.
Indicators used
- File extension check for path inputs (e.g., .py, .pyi, .pyx, .pyw).
- Shebang line detection (e.g., "#!/usr/bin/python").
- Definitive syntax check using Python's
ast.parse(). - Fallback heuristic via Pygments lexer guessing.
Note
The function is definitive for complete, syntactically correct code blocks. It falls back to a Pygments heuristic only for short, incomplete, or ambiguous code snippets that fail AST parsing.
Parameters:
-
(sourcestr | Path) –raw code string or Path to source file to check.
Returns:
-
bool(bool) –True if the source is written in Python.