Class RecursiveTextSplitter
Recursively splits text using an ordered list of separators (LangChain-style). At each level the best separator is chosen, small pieces are merged up to ChunkSize, and only oversized pieces recurse to the next separator.
public class RecursiveTextSplitter : ITextSplitter
- Inheritance
-
RecursiveTextSplitter
- Implements
- Inherited Members
Constructors
RecursiveTextSplitter()
public RecursiveTextSplitter()
RecursiveTextSplitter(int, int, IEnumerable<string>?)
public RecursiveTextSplitter(int chunkSize, int chunkOverlap = 200, IEnumerable<string>? separators = null)
Parameters
chunkSizeintchunkOverlapintseparatorsIEnumerable<string>
Properties
ChunkOverlap
Number of overlapping characters between consecutive chunks.
public int ChunkOverlap { get; set; }
Property Value
ChunkSize
Maximum number of characters per chunk.
public int ChunkSize { get; set; }
Property Value
KeepSeparator
When true the separator is kept at the start of the next split so that paragraph / sentence boundaries are preserved in the chunk text. Default: true.
public bool KeepSeparator { get; set; }
Property Value
Separators
Ordered list of separators to try when splitting. The splitter picks the first separator found in the text. An empty string as the last entry enables character-level splitting as a last resort.
public string[] Separators { get; set; }
Property Value
- string[]
Methods
Split(RagDocument)
Splits a document into chunks. Implementations may split by character count, token count, sentence boundary, etc.
public IReadOnlyList<RagChunk> Split(RagDocument document)
Parameters
documentRagDocumentThe document to split.
Returns
- IReadOnlyList<RagChunk>
An ordered list of chunks.
SplitText(string)
Split text into chunks respecting ChunkSize and ChunkOverlap.
public List<string> SplitText(string text)
Parameters
textstring