chunklet.document_chunker.processors.base_processor
Classes:
-
BaseProcessor–Abstract base class for document processors, providing a unified interface
BaseProcessor
Bases: ABC
Abstract base class for document processors, providing a unified interface for extracting text and metadata from documents.
Initializes the processor with the path to the document.
Parameters:
-
(file_pathstr) –Path to the document file.
Methods:
-
extract_metadata–Extracts metadata from the document.
-
extract_text–Yields text content from the document.
Source code in src/chunklet/document_chunker/processors/base_processor.py
extract_metadata
abstractmethod
Extracts metadata from the document.
Returns:
-
dict[str, Any]–dict[str, Any]: Dictionary containing document metadata.
extract_text
abstractmethod
Yields text content from the document.
Yields:
-
str(str) –Text content chunks from the document.