Skip to content

chunklet.code_chunker.patterns

regex_patterns.py

Written by: Speedyk-005 Copyright 2025 License: MIT

This module contains regular expressions for chunking and parsing source code across multiple programming languages. The patterns are designed to match:

  • Single-line comments (Python, C/C++, Java, JavaScript, Lisp, etc.)
  • Multi-line comments / docstrings (Python, C-style, Ruby, Lisp, etc.)
  • Function or method definitions across various languages
  • Namespaces, classes, modules, and interfaces
  • Annotations / decorators (Python, C#, Java)
  • Block-ending indicators ('}' or 'end')

These regexes can be imported into a chunker or parser to identify logical sections of code for semantic analysis, tokenization, or processing.

Note
  • re.M = multiline (^,$ match each line)
  • re.S = DOTALL (. matches newline)