Class Bm25Tokenizer
BM25 tokenizer backed by Lucene.Net Lucene.Net.Analysis.Standard.StandardAnalyzer. Used by in-memory BM25 indexing and Qdrant sparse vector building.
public static class Bm25Tokenizer
- Inheritance
-
Bm25Tokenizer
- Inherited Members
Methods
Analyze(string)
Analyzes text in a single pass and returns both normalized tokens and term frequencies.
public static Bm25Tokenizer.AnalysisResult Analyze(string text)
Parameters
textstringThe input text to analyze.
Returns
- Bm25Tokenizer.AnalysisResult
Analysis result containing tokens and term-frequency map.
ComputeTermFrequencies(IReadOnlyList<string>)
Computes term frequencies for a tokenized document.
public static Dictionary<string, int> ComputeTermFrequencies(IReadOnlyList<string> tokens)
Parameters
tokensIReadOnlyList<string>The tokens to compute frequencies for.
Returns
- Dictionary<string, int>
A dictionary mapping each token to its frequency.
Tokenize(string)
Tokenizes the input text into a list of normalized terms.
public static IReadOnlyList<string> Tokenize(string text)
Parameters
textstringThe input text to tokenize.
Returns
- IReadOnlyList<string>
A list of normalized, non-stopword tokens.