Skip to content

chunklet

Chunklet: Advanced Text, Code, and Document Chunking for LLM Applications

A comprehensive library for semantic text segmentation, interactive chunk visualization, and multi-format document processing. Split content intelligently across 50+ languages, visualize chunks in real-time, and handle various file types with flexible, context-aware chunking strategies.

Key Features: - Sentence splitting: Multilingual text segmentation across 50+ languages - Semantic chunking: PlainTextChunker, DocumentChunker, and CodeChunker - Interactive visualization: Web-based chunk exploration and parameter tuning - Multi-format support: Text, code, PDF, DOCX, EPUB, and more - Batch processing: Memory-optimized generators with flexible error handling

Modules:

Classes:

CallbackError

Bases: ChunkletError

Raised when a callback function provided to chunker or splitter fails during execution.

ChunkletError

Bases: Exception

Base exception for chunking and splitting operations.

FileProcessingError

Bases: ChunkletError

Raised when a file cannot be loaded, opened, or accessed.

InvalidInputError

Bases: ChunkletError

Raised when one or multiple invalid input(s) are encountered.

MissingTokenCounterError

MissingTokenCounterError(msg: str = '')

Bases: InvalidInputError

Raised when a token_counter is required but not provided.

Source code in src/chunklet/exceptions.py
def __init__(self, msg: str = ""):
    self.msg = msg or (
        "A token_counter is required for token-based chunking.\n"
        "💡 Hint: Pass a token counting function to the `chunk` method, like `chunker.chunk(..., token_counter=tk)`\n"
        "or configure it in the class initialization: `.*Chunker(token_counter=tk)`"
    )
    super().__init__(self.msg)

TokenLimitError

Bases: ChunkletError

Raised when max_tokens constraint is exceeded.

UnsupportedFileTypeError

Bases: FileProcessingError

Raised when a file type is not supported for a given operation.