Re-ranking & Retrieval Tuning
📍 Question Answering Pipeline: Query Rewriting → Embedding → Filtering → Retrieval →
Re-ranking→ Context Build
Why Re-ranking?
Vector search returns candidates sorted by embedding similarity, but embedding similarity is an approximation. A chunk that scores 0.82 might actually be more relevant than one scoring 0.85 — the embedding just couldn't tell them apart.
A re-ranker takes the initial candidate list and scores each chunk against the original query with a more powerful model, producing a much more accurate relevance ordering. This is especially valuable when:
- Your corpus contains many similar-looking chunks (e.g. FAQ entries)
- The top results from vector search feel "close but not quite right"
- You need high-precision answers for critical use cases
Re-ranker Options
LLM Reranker
Uses your AI service to score results. Effective but adds latency:
.WithRag(rag => rag
.WithReranker(new LlmReranker(aiService))
.AddDocument("corpus.txt")
)
Cohere Reranker
Calls the Cohere Rerank API — fast and accurate:
.WithRag(rag => rag
.WithReranker(new CohereReranker(cohereApiKey))
.AddDocument("corpus.txt")
)
vLLM Reranker
Uses a locally hosted vLLM reranking endpoint:
.WithRag(rag => rag
.WithReranker(new VllmReranker("http://localhost:8000"))
.AddDocument("corpus.txt")
)
Retrieval Parameters
Control how many candidates are retrieved and how they are filtered before final selection:
.WithRag(rag => rag
.WithTopK(5) // Final number of chunks returned
.WithRetrievalMultiplier(3) // Retrieve topK × 3 candidates (for reranking)
.WithMinScore(0.6) // Minimum similarity score
.AddDocument("corpus.txt")
)
TopK— how many chunks end up in the LLM contextRetrievalMultiplier— cast a wider net so the reranker has more to work with. A multiplier of 3 means 15 candidates are fetched, then the best 5 survive reranking.MinScore— discard anything below this similarity threshold, even if fewer thanTopKchunks remain
Final Selection Mode
When a reranker is used, choose how the final ranking score is calculated:
using Mythosia.AI.Rag;
// Default: trust reranker scores only
.WithFinalSelectionPolicy(RagFinalSelectionMode.RerankerOnly)
// Blend retrieval score and reranker score
.WithFinalSelectionPolicy(RagFinalSelectionMode.WeightedBlend, retrievalWeight: 0.65) // 65% retrieval, 35% reranker
RerankerOnly is the safe default — the reranker's judgment completely replaces the initial retrieval score.
WeightedBlend preserves the original retrieval signal while incorporating reranker judgment. This can help when your vector embeddings are already high-quality and you want the reranker to act as a tiebreaker rather than a full override.