Skip to content

chunklet.base_chunker

Base Chunker Abstract Class

Defines the interface for chunkers.

Classes:

BaseChunker

BaseChunker(verbose: bool = False)

Bases: ABC

Abstract base class for chunkers.

Defines the standard interface for chunking content into units.

Methods:

Source code in src/chunklet/base_chunker.py
def __init__(self, verbose: bool = False):
    self.verbose = verbose

chunk_file abstractmethod

chunk_file(*args, **kwargs) -> list[DotDict]

Read and chunk a file.

Returns:

  • list[DotDict]

    List of chunks with content and metadata.

Source code in src/chunklet/base_chunker.py
@abstractmethod
def chunk_file(self, *args, **kwargs) -> list[DotDict]:
    """
    Read and chunk a file.

    Returns:
        List of chunks with content and metadata.
    """
    pass

chunk_files abstractmethod

chunk_files(
    *args, **kwargs
) -> Generator[DotDict, None, None]

Process multiple files.

Yields:

  • DotDict

    DotDict object, representing a chunk with its content and metadata.

Source code in src/chunklet/base_chunker.py
@abstractmethod
def chunk_files(self, *args, **kwargs) -> Generator[DotDict, None, None]:
    """
    Process multiple files.

    Yields:
        `DotDict` object, representing a chunk with its content and metadata.
    """
    pass

chunk_text abstractmethod

chunk_text(*args, **kwargs) -> list[DotDict]

Extract chunks from text.

Returns:

  • list[DotDict]

    List of chunks with content and metadata.

Source code in src/chunklet/base_chunker.py
@abstractmethod
def chunk_text(self, *args, **kwargs) -> list[DotDict]:
    """
    Extract chunks from text.

    Returns:
        List of chunks with content and metadata.
    """
    pass

chunk_texts abstractmethod

chunk_texts(*args, **kwargs) -> list[list[DotDict]]

Process multiple texts.

Returns:

  • list[list[DotDict]]

    List of chunks for each input text.

Source code in src/chunklet/base_chunker.py
@abstractmethod
def chunk_texts(self, *args, **kwargs) -> list[list[DotDict]]:
    """
    Process multiple texts.

    Returns:
        List of chunks for each input text.
    """
    pass