Skip to content

What's New

What's on This Page

The big stuff. The shiny new things. The stuff we got tired of fixing. For everything else, there's the changelog.


Chunklet v2.2.0

✨ Simpler Chunking API

We renamed some methods. Yes, we're those people who rename things. But honestly, the old names were confusing β€” even to us:

  • chunk_text() β€” chunk a string
  • chunk_file() β€” chunk a file directly
  • chunk_texts() β€” batch strings
  • chunk_files() β€” batch files

The old chunk and batch_chunk still work. They'll whine at you with a deprecation warning. Deal with it or migrate β€” your choice.

πŸ”— PlainTextChunker Got Absorbed

PlainTextChunker is now part of DocumentChunker. We know β€” having two chunkers was weird. Just use chunk_text() or chunk_texts() like a normal person. The old import still works, technically, with a deprecation warning.

βœ‚οΈ SentenceSplitter Now Does split_text()

split() is out. split_text() is in. We renamed it because apparently "split" was too short. There's also now split_file() if you're the type who likes skipping steps.

🎨 Visualizer Makeover

The chunk visualizer finally got some love:

  • Fullscreen mode β€” for when you want to pretend you're doing something important
  • 3-row layout β€” less cluttered, more clickable
  • Smoother hovers β€” no more seizure-inducing animations
  • Smarter buttons β€” they stay enabled because, honestly, disabling them was stupid

⌨️ Shorter CLI Flags

Finally, stuff you can actually type without wrist strain:

  • -l for --lang
  • -h for --host
  • -m for --metadata

You're welcome.

πŸ§‘β€πŸ’» Code Chunking, Less Broken

Code chunking got slightly less terrible:

  • Cleaner output β€” fixed weird artifacts in chunks from comment handling (we know, it was annoying)
  • More languages β€” Forth, PHP 8 attributes, VB.NET, ColdFusion, and Pascal. Yes, really.
  • String protection β€” multi-line strings and triple-quotes won't get mangled anymore

πŸ”§ The Boring But Necessary Stuff

  • Tokenizer timeout β€” new --tokenizer-timeout / -t flag so custom tokenizers don't hang forever
  • Direct imports β€” from chunklet import DocumentChunker now works without making things slow
  • Fewer crashes β€” fixed dependency issues with setuptools<81 in CI (sentsplit and pkg_resources, long story)
  • Global registries β€” custom_splitter_registry and custom_processor_registry exist now
  • Error messages β€” slightly less cryptic when things explode

Chunklet v2.1.1

πŸ› Visualizer Was Broken

The visualizer didn't work after installing from PyPI. Static files were MIA. Fixed now, obviously.


Chunklet v2.1.0

🌐 Visualizer 1.0

We built an actual UI. Because sometimes you want to click buttons instead of writing code:

  • Interactive web interface for parameter tuning
  • Launch with chunklet visualize
  • Works with all chunker types

πŸ“ More File Formats

ODT, CSV, and Excel (.xlsx) β€” added in this release. Because apparently plain text wasn't enough for some people.


Chunklet v2.0.0

πŸš€ The Big Rewrite (aka "We Broke Everything")

We rewrote the whole thing. You're welcome? Here's what changed:

  • πŸ—ƒ New classes β€” PlainTextChunker, DocumentChunker, CodeChunker
  • 🌍 50+ languages β€” because the world has more than English
  • πŸ“„ Document formats β€” PDF, DOCX, EPUB, HTML, etc.
  • πŸ’» Code understanding β€” actual code chunking, not just "split by lines like a savage"
  • 🎯 New constraints β€” max_section_breaks and max_lines for finer control
  • ⚑ Memory efficient batch β€” generators in batch methods so your RAM doesn't cry

πŸ—ΊοΈ Want More Details?

The changelog has everything. We're not gonna repeat it here.