Class MarkdownTextSplitter
Structure-aware Markdown splitter that understands heading hierarchy (H1โH6), preserves atomic blocks (code fences, tables), and prepends heading breadcrumbs to each chunk so that vector search retrieves contextually rich fragments.
public class MarkdownTextSplitter : ITextSplitter
- Inheritance
-
MarkdownTextSplitter
- Implements
- Inherited Members
Constructors
MarkdownTextSplitter()
public MarkdownTextSplitter()
MarkdownTextSplitter(int, int)
public MarkdownTextSplitter(int chunkSize, int chunkOverlap = 200)
Parameters
Properties
ChunkOverlap
Number of overlapping characters carried from the previous chunk.
public int ChunkOverlap { get; set; }
Property Value
ChunkSize
Maximum characters per chunk (excluding the prepended breadcrumb).
public int ChunkSize { get; set; }
Property Value
IncludeHeadingBreadcrumb
When true, each chunk is prefixed with the heading path that leads to its content (e.g. "# Doc Title\n## Section\n### Sub-section\n\n"). This dramatically improves retrieval relevance. Default is true.
public bool IncludeHeadingBreadcrumb { get; set; }
Property Value
MinSplitHeadingLevel
Minimum heading level that triggers a new section split. 1 = split on all headings (#โ######), 2 = ignore H1, etc. Default: 1.
public int MinSplitHeadingLevel { get; set; }
Property Value
Methods
Split(RagDocument)
Splits a document into chunks. Implementations may split by character count, token count, sentence boundary, etc.
public IReadOnlyList<RagChunk> Split(RagDocument document)
Parameters
documentRagDocumentThe document to split.
Returns
- IReadOnlyList<RagChunk>
An ordered list of chunks.