chunklet
Chunklet: The v2.0.0 Evolution - Multi-strategy, Context-aware, Multilingual Text & Code Chunker
This package provides a robust and flexible solution for splitting large texts and code into smaller, manageable chunks. Designed for applications like Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) pipelines, and other context-aware Natural Language Processing (NLP) tasks.
Version 2.0.0 introduces a revamped architecture with:
- Dedicated chunkers: PlainTextChunker (formerly Chunklet), DocumentChunker, and CodeChunker.
- Expanded language support (50+ languages) and improved error handling.
- Flexible batch processing with on_errors parameter and memory-optimized generators.
- Enhanced modularity, extensibility, and performance.
Modules:
-
cli– -
code_chunker– -
common– -
document_chunker– -
exceptions– -
plain_text_chunker– -
sentence_splitter–
Classes:
-
CallbackError–Raised when a callback function provided to chunker
-
ChunkletError–Base exception for chunking and splitting
-
FileProcessingError–Raised when a file cannot be loaded, opened, or
-
InvalidInputError–Raised when one or multiple invalid input(s) are
-
MissingTokenCounterError–Raised when a token_counter is required but not
-
TokenLimitError–Raised when max_tokens constraint is exceeded.
-
UnsupportedFileTypeError–Raised when a file type is not supported for a given operation.
CallbackError
Bases: ChunkletError
Raised when a callback function provided to chunker or splitter fails during execution.
ChunkletError
Bases: Exception
Base exception for chunking and splitting operations.
FileProcessingError
Bases: ChunkletError
Raised when a file cannot be loaded, opened, or accessed.
InvalidInputError
Bases: ChunkletError
Raised when one or multiple invalid input(s) are encountered.
MissingTokenCounterError
Bases: InvalidInputError
Raised when a token_counter is required but not provided.
Source code in src/chunklet/exceptions.py
TokenLimitError
Bases: ChunkletError
Raised when max_tokens constraint is exceeded.
UnsupportedFileTypeError
Bases: FileProcessingError
Raised when a file type is not supported for a given operation.