chunkit

Smart document chunking for RAG pipelines. Split text into semantically meaningful chunks with configurable strategies and overlap.

Install

npm install chunkit

Usage

import { chunk } from 'chunkit'

const text = 'Your document content here...'
const chunks = chunk(text, { strategy: 'recursive', maxSize: 1000, overlap: 100 })

Each chunk contains:

{
  content: string   // chunk text
  index: number     // zero-based position in the sequence
  start: number     // start offset in the original text
  end: number       // end offset in the original text
  length: number    // content length in characters
}

Strategies

Fixed

Splits text into chunks of exactly maxSize characters. Simple and predictable.

chunk(text, { strategy: 'fixed', maxSize: 512, overlap: 50 })

Recursive (default)

Splits on the largest semantic boundary that fits within maxSize. Tries paragraph breaks first, then line breaks, then sentences, then words. Produces the most coherent chunks for general text.

chunk(text, { strategy: 'recursive', maxSize: 1000, overlap: 100 })

Markdown

Splits on heading boundaries (h1-h3), then paragraphs. Keeps code blocks intact when they fit within maxSize. Best for structured documentation.

chunk(text, { strategy: 'markdown', maxSize: 800, overlap: 0 })

API

`chunk(text, options?)`

Option	Type	Default	Description
`strategy`	`'fixed' \| 'recursive' \| 'markdown'`	`'recursive'`	Chunking strategy
`maxSize`	`number`	`1000`	Maximum chunk size in characters
`overlap`	`number`	`100`	Overlap between consecutive chunks

Returns Chunk[].

Examples

Chunking for embeddings

import { chunk } from 'chunkit'

const docs = chunk(documentText, {
  strategy: 'recursive',
  maxSize: 512,    // match your embedding model's sweet spot
  overlap: 50,     // context continuity between chunks
})

for (const doc of docs) {
  const embedding = await embed(doc.content)
  await vectorStore.insert({
    content: doc.content,
    embedding,
    metadata: { start: doc.start, end: doc.end },
  })
}

Splitting markdown documentation

import { chunk } from 'chunkit'

const sections = chunk(readme, {
  strategy: 'markdown',
  maxSize: 1500,
  overlap: 0,
})

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

chunkit

Install

Usage

Strategies

Fixed

Recursive (default)

Markdown

API

`chunk(text, options?)`

Examples

Chunking for embeddings

Splitting markdown documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

chunkit

Install

Usage

Strategies

Fixed

Recursive (default)

Markdown

API

chunk(text, options?)

Examples

Chunking for embeddings

Splitting markdown documentation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`chunk(text, options?)`

Packages