What's New
What's on This Page
The big stuff. The shiny new things. The stuff we got tired of fixing. For everything else, there's the changelog.
Chunklet v2.2.0
β¨ Simpler Chunking API
We renamed some methods. Yes, we're those people who rename things. But honestly, the old names were confusing β even to us:
chunk_text()β chunk a stringchunk_file()β chunk a file directlychunk_texts()β batch stringschunk_files()β batch files
The old chunk and batch_chunk still work. They'll whine at you with a deprecation warning. Deal with it or migrate β your choice.
π PlainTextChunker Got Absorbed
PlainTextChunker is now part of DocumentChunker. We know β having two chunkers was weird. Just use chunk_text() or chunk_texts() like a normal person. The old import still works, technically, with a deprecation warning.
βοΈ SentenceSplitter Now Does split_text()
split() is out. split_text() is in. We renamed it because apparently "split" was too short. There's also now split_file() if you're the type who likes skipping steps.
π¨ Visualizer Makeover
The chunk visualizer finally got some love:
- Fullscreen mode β for when you want to pretend you're doing something important
- 3-row layout β less cluttered, more clickable
- Smoother hovers β no more seizure-inducing animations
- Smarter buttons β they stay enabled because, honestly, disabling them was stupid
β¨οΈ Shorter CLI Flags
Finally, stuff you can actually type without wrist strain:
-lfor--lang-hfor--host-mfor--metadata
You're welcome.
π§βπ» Code Chunking, Less Broken
Code chunking got slightly less terrible:
- Cleaner output β fixed weird artifacts in chunks from comment handling (we know, it was annoying)
- More languages β Forth, PHP 8 attributes, VB.NET, ColdFusion, and Pascal. Yes, really.
- String protection β multi-line strings and triple-quotes won't get mangled anymore
π§ The Boring But Necessary Stuff
- Tokenizer timeout β new
--tokenizer-timeout/-tflag so custom tokenizers don't hang forever - Direct imports β
from chunklet import DocumentChunkernow works without making things slow - Fewer crashes β fixed dependency issues with
setuptools<81in CI (sentsplit and pkg_resources, long story) - Global registries β
custom_splitter_registryandcustom_processor_registryexist now - Error messages β slightly less cryptic when things explode
Chunklet v2.1.1
π Visualizer Was Broken
The visualizer didn't work after installing from PyPI. Static files were MIA. Fixed now, obviously.
Chunklet v2.1.0
π Visualizer 1.0
We built an actual UI. Because sometimes you want to click buttons instead of writing code:
- Interactive web interface for parameter tuning
- Launch with
chunklet visualize - Works with all chunker types
π More File Formats
ODT, CSV, and Excel (.xlsx) β added in this release. Because apparently plain text wasn't enough for some people.
Chunklet v2.0.0
π The Big Rewrite (aka "We Broke Everything")
We rewrote the whole thing. You're welcome? Here's what changed:
- π New classes β PlainTextChunker, DocumentChunker, CodeChunker
- π 50+ languages β because the world has more than English
- π Document formats β PDF, DOCX, EPUB, HTML, etc.
- π» Code understanding β actual code chunking, not just "split by lines like a savage"
- π― New constraints β
max_section_breaksandmax_linesfor finer control - β‘ Memory efficient batch β generators in batch methods so your RAM doesn't cry
πΊοΈ Want More Details?
The changelog has everything. We're not gonna repeat it here.