Skip to content

chunklet.document_chunker.processors.base_processor

Classes:

  • BaseProcessor

    Abstract base class for document processors, providing a unified interface

BaseProcessor

BaseProcessor(file_path: str)

Bases: ABC

Abstract base class for document processors, providing a unified interface for extracting text and metadata from documents.

Initializes the processor with the path to the document.

Parameters:

  • file_path

    (str) –

    Path to the document file.

Methods:

Source code in src/chunklet/document_chunker/processors/base_processor.py
def __init__(self, file_path: str):
    """
    Initializes the processor with the path to the document.

    Args:
        file_path (str): Path to the document file.
    """
    self.file_path = file_path

extract_metadata abstractmethod

extract_metadata() -> dict[str, Any]

Extracts metadata from the document.

Returns:

  • dict[str, Any]

    dict[str, Any]: Dictionary containing document metadata.

Source code in src/chunklet/document_chunker/processors/base_processor.py
@abstractmethod
def extract_metadata(self) -> dict[str, Any]:
    """
    Extracts metadata from the document.

    Returns:
        dict[str, Any]: Dictionary containing document metadata.
    """
    pass

extract_text abstractmethod

extract_text() -> Generator[str, None, None]

Yields text content from the document.

Yields:

  • str ( str ) –

    Text content chunks from the document.

Source code in src/chunklet/document_chunker/processors/base_processor.py
@abstractmethod
def extract_text(self) -> Generator[str, None, None]:
    """
    Yields text content from the document.

    Yields:
        str: Text content chunks from the document.
    """
    pass