chunklet
Chunklet: Advanced Text, Code, and Document Chunking for LLM Applications
A comprehensive library for semantic text segmentation, interactive chunk visualization, and multi-format document processing. Split content intelligently across 50+ languages, visualize chunks in real-time, and handle various file types with flexible, context-aware chunking strategies.
Key Features: - Sentence splitting: Multilingual text segmentation across 50+ languages - Semantic chunking: PlainTextChunker, DocumentChunker, and CodeChunker - Interactive visualization: Web-based chunk exploration and parameter tuning - Multi-format support: Text, code, PDF, DOCX, EPUB, and more - Batch processing: Memory-optimized generators with flexible error handling
Modules:
-
base_chunker–Base Chunker Abstract Class
-
cli– -
code_chunker– -
common– -
document_chunker– -
exceptions– -
plain_text_chunker– -
sentence_splitter– -
visualizer–
Classes:
-
CallbackError–Raised when a callback function provided to chunker
-
ChunkletError–Base exception for chunking and splitting
-
FileProcessingError–Raised when a file cannot be loaded, opened, or
-
InvalidInputError–Raised when one or multiple invalid input(s) are
-
MissingTokenCounterError–Raised when a token_counter is required but not
-
TokenLimitError–Raised when max_tokens constraint is exceeded.
-
UnsupportedFileTypeError–Raised when a file type is not supported for a given operation.
CallbackError
Bases: ChunkletError
Raised when a callback function provided to chunker or splitter fails during execution.
ChunkletError
Bases: Exception
Base exception for chunking and splitting operations.
FileProcessingError
Bases: ChunkletError
Raised when a file cannot be loaded, opened, or accessed.
InvalidInputError
Bases: ChunkletError
Raised when one or multiple invalid input(s) are encountered.
MissingTokenCounterError
Bases: InvalidInputError
Raised when a token_counter is required but not provided.
Source code in src/chunklet/exceptions.py
TokenLimitError
Bases: ChunkletError
Raised when max_tokens constraint is exceeded.
UnsupportedFileTypeError
Bases: FileProcessingError
Raised when a file type is not supported for a given operation.