Rerank documents based on semantic relevance to a query.
This endpoint computes semantic relevance scores for a list of documents against a query, returning them in descending order of relevance. Supports multiple reranking methods from fast bi-encoders to advanced hybrid approaches. Ideal for improving search results and retrieval-augmented generation (RAG) pipelines.
Request Body:
model: Reranker model ID (required) - e.g., bge-reranker-v2-m3query: Search query or question text (required)documents: Array of document texts or objects to rerank (required) - Can be strings or dictionariesmethod (optional): Reranking method (default: fast):
fast: Bi-encoder only (fastest, good for most use cases)standard: Cross-encoder only (slower but more accurate)optimal: Hybrid bi-encoder + cross-encoder (best speed/accuracy balance)best: Advanced subspace + attention mechanism (highest accuracy)model_cross (optional): Cross-encoder model ID (required for standard and optimal methods)top_n (optional): Return only top N most relevant documents (minimum: 1)stage1_top_k (optional): Stage 1 top-k for hybrid methods (optimal/best) (default: 100, minimum: 1)max_tokens_per_doc (optional): Maximum tokens per document (default: 512, minimum: 1)Returns:
results: Array of reranked documents with relevance scores, each containing:
index: Original position in input arrayrelevance_score: Computed relevance score (higher = more relevant)document (optional): Document text or object (if requested)meta: Operation metadata containing:
method: Reranking method useddeployed_model_id: Model ID that was usedmodel_cross (optional): Cross-encoder model used (if applicable)stage1_top_k (optional): Stage 1 top-k for hybrid methodsExample Request (Fast Method):
POST /api/v3/inference/reranker
Headers: {"Authorization": "Bearer <api_key>"}
{
"model": "bge-reranker-v2-m3",
"query": "What is machine learning?",
"documents": [
"Machine learning is a subset of artificial intelligence that enables computers to learn from data.",
"Python is a popular programming language for web development.",
"Deep learning uses neural networks with multiple layers.",
"The weather today is sunny and warm.",
"Supervised learning requires labeled training data."
],
"method": "fast",
"top_n": 3
}
Example Response:
{
"results": [
{
"index": 0,
"relevance_score": 0.95,
"document": "Machine learning is a subset of artificial intelligence that enables computers to learn from data."
},
{
"index": 4,
"relevance_score": 0.82,
"document": "Supervised learning requires labeled training data."
},
{
"index": 2,
"relevance_score": 0.76,
"document": "Deep learning uses neural networks with multiple layers."
}
],
"meta": {
"method": "fast",
"deployed_model_id": "bge-reranker-v2-m3"
}
}
Example Request (Hybrid Method):
POST /api/v3/inference/reranker
Headers: {"Authorization": "Bearer <api_key>"}
{
"model": "bge-reranker-v2-m3",
"model_cross": "bge-reranker-cross-encoder",
"query": "How do neural networks work?",
"documents": [
"Neural networks are computing systems inspired by biological neural networks.",
"The stock market is volatile today.",
"Backpropagation is used to train neural networks."
],
"method": "optimal",
"stage1_top_k": 100,
"top_n": 2
}
Example Response (Hybrid Method):
{
"results": [
{
"index": 0,
"relevance_score": 0.92,
"document": "Neural networks are computing systems inspired by biological neural networks."
},
{
"index": 2,
"relevance_score": 0.88,
"document": "Backpropagation is used to train neural networks."
}
],
"meta": {
"method": "optimal",
"deployed_model_id": "bge-reranker-v2-m3",
"model_cross": "bge-reranker-cross-encoder",
"stage1_top_k": 100
}
}
Use Cases:
Method Selection Guide:
fast for most applications where speed is importantstandard when you need higher accuracy and have a cross-encoderoptimal for the best balance of speed and accuracy in productionbest when maximum accuracy is critical regardless of latencyNotes:
top_n parameter filters results after rerankingoptimal, best) use two-stage ranking for efficiencymax_tokens_per_doc to control processing of long documentsNotes:
top_n to limit results and reduce processing timeBearer authentication header of the form Bearer <token>, where <token> is your auth token.
Rerank request schema for API v3
Model ID to use for reranking
"bge-reranker-v2-m3"
Search query
"What is Python?"
Documents to rank
[
"Python is a programming language",
"Java is a language"
]
Rerank method: fast (bi-encoder), standard (cross-encoder), optimal (hybrid bi+cross), best (subspace+attention)
fast, standard, optimal, best Cross-encoder model (required for standard/optimal methods)
Return top N results
x >= 1Stage 1 top-k for hybrid methods (optimal/best)
x >= 1Max tokens per document
x >= 1Batch size for processing
x >= 1Include documents in response
Return the re-ranked chunks based on similarity