chunklet.sentence_splitter.registry
Classes:
CustomSplitterRegistry
Methods:
-
clear–Clears all registered splitters from the registry.
-
is_registered–Check if a splitter is registered for the given language.
-
register–Register a splitter callback for one or more languages.
-
split–Processes a text using a splitter registered for the given language.
-
unregister–Remove splitter(s) from the registry.
Attributes:
-
splitters–Returns a shallow copy of the dictionary of registered splitters.
splitters
property
Returns a shallow copy of the dictionary of registered splitters.
This prevents external modification of the internal registry state.
clear
is_registered
register
Register a splitter callback for one or more languages.
This method can be used in two ways: 1. As a decorator: @registry.register("en", "fr", name="my_splitter") def my_splitter(text): ...
- As a direct function call: registry.register(my_splitter, "en", "fr", name="my_splitter")
Parameters:
-
(*argsAny, default:()) –The arguments, which can be either (lang1, lang2, ...) for a decorator or (callback, lang1, lang2, ...) for a direct call.
-
(namestr, default:None) –The name of the splitter. If None, attempts to use the callback's name.
Source code in src/chunklet/sentence_splitter/registry.py
split
Processes a text using a splitter registered for the given language.
Parameters:
-
(textstr) –The text to split.
-
(langstr) –The language of the text.
Returns:
-
tuple[list[str], str]–tuple[list[str], str]: A tuple containing a list of sentences and the name of the splitter used.
Raises:
-
CallbackError–If the splitter callback fails.
-
TypeError–If the splitter returns the wrong type.
Examples:
>>> from chunklet.sentence_splitter import CustomSplitterRegistry
>>> registry = CustomSplitterRegistry()
>>> @registry.register("xx", name="custom_splitter")
... def custom_splitter(text: str) -> list[str]:
... return text.split(" ")
>>> registry.split("Hello World", "xx")
(['Hello', 'World'], 'custom_splitter')
Source code in src/chunklet/sentence_splitter/registry.py
unregister
Remove splitter(s) from the registry.
Parameters:
-
(*langsstr, default:()) –Language codes to remove