chunklet.code_chunker.patterns
regex_patterns.py
Written by: Speedyk-005 Copyright 2025 License: MIT
This module contains regular expressions for chunking and parsing source code across multiple programming languages. The patterns are designed to match:
- Single-line comments (Python, C/C++, Java, JavaScript, Lisp, etc.)
- Multi-line comments / docstrings (Python, C-style, Ruby, Lisp, etc.)
- Function or method definitions across various languages
- Namespaces, classes, modules, and interfaces
- Annotations / decorators (Python, C#, Java)
- Block-ending indicators ('}' or 'end')
These regexes can be imported into a chunker or parser to identify logical sections of code for semantic analysis, tokenization, or processing.
Note
- re.M = multiline (^,$ match each line)
- re.S = DOTALL (. matches newline)