chunklet.document_chunker.processors.base_processor

Classes:

BaseProcessor –

Abstract base class for document processors, providing a unified interface

BaseProcessor

BaseProcessor(file_path: str)

Bases: ABC

Abstract base class for document processors, providing a unified interface for extracting text and metadata from documents.

Initializes the processor with the path to the document.

Parameters:

file_path
(str) –

Path to the document file.

Methods:

extract_metadata –

Extracts metadata from the document.
extract_text –

Yields text content from the document.

Source code in src/chunklet/document_chunker/processors/base_processor.py

def __init__(self, file_path: str):
    """
    Initializes the processor with the path to the document.

    Args:
        file_path (str): Path to the document file.
    """
    self.file_path = file_path

extract_metadata `abstractmethod`

extract_metadata() -> dict[str, Any]

Extracts metadata from the document.

Returns:

dict[str, Any] –

dict[str, Any]: Dictionary containing document metadata.

Source code in src/chunklet/document_chunker/processors/base_processor.py

@abstractmethod
def extract_metadata(self) -> dict[str, Any]:
    """
    Extracts metadata from the document.

    Returns:
        dict[str, Any]: Dictionary containing document metadata.
    """
    pass

extract_text `abstractmethod`

extract_text() -> Generator[str, None, None]

Yields text content from the document.

Yields:

str ( str ) –

Text content chunks from the document.

Source code in src/chunklet/document_chunker/processors/base_processor.py

@abstractmethod
def extract_text(self) -> Generator[str, None, None]:
    """
    Yields text content from the document.

    Yields:
        str: Text content chunks from the document.
    """
    pass

chunklet.document_chunker.processors.base_processor

BaseProcessor

file_path

extract_metadata abstractmethod

extract_text abstractmethod

`file_path`

extract_metadata `abstractmethod`

extract_text `abstractmethod`