Table of Contents

Class RecursiveTextSplitter

Namespace
Mythosia.AI.Rag.Splitters
Assembly
Mythosia.AI.Rag.dll

Recursively splits text using an ordered list of separators (LangChain-style). At each level the best separator is chosen, small pieces are merged up to ChunkSize, and only oversized pieces recurse to the next separator.

public class RecursiveTextSplitter : ITextSplitter
Inheritance
RecursiveTextSplitter
Implements
Inherited Members

Constructors

RecursiveTextSplitter()

public RecursiveTextSplitter()

RecursiveTextSplitter(int, int, IEnumerable<string>?)

public RecursiveTextSplitter(int chunkSize, int chunkOverlap = 200, IEnumerable<string>? separators = null)

Parameters

chunkSize int
chunkOverlap int
separators IEnumerable<string>

Properties

ChunkOverlap

Number of overlapping characters between consecutive chunks.

public int ChunkOverlap { get; set; }

Property Value

int

ChunkSize

Maximum number of characters per chunk.

public int ChunkSize { get; set; }

Property Value

int

KeepSeparator

When true the separator is kept at the start of the next split so that paragraph / sentence boundaries are preserved in the chunk text. Default: true.

public bool KeepSeparator { get; set; }

Property Value

bool

Separators

Ordered list of separators to try when splitting. The splitter picks the first separator found in the text. An empty string as the last entry enables character-level splitting as a last resort.

public string[] Separators { get; set; }

Property Value

string[]

Methods

Split(RagDocument)

Splits a document into chunks. Implementations may split by character count, token count, sentence boundary, etc.

public IReadOnlyList<RagChunk> Split(RagDocument document)

Parameters

document RagDocument

The document to split.

Returns

IReadOnlyList<RagChunk>

An ordered list of chunks.

SplitText(string)

Split text into chunks respecting ChunkSize and ChunkOverlap.

public List<string> SplitText(string text)

Parameters

text string

Returns

List<string>