Skip to main content
POST
/
api
/
v3
/
inference
/
reranker
Re-ranks a list of documents based on their relevance to the input query using an LLM-based reranking model.
curl --request POST \
  --url https://api.nugen.in/api/v3/inference/reranker \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "query": "<string>",
  "documents": [
    "<string>"
  ],
  "method": "fast",
  "model_cross": "<string>",
  "top_n": 2,
  "stage1_top_k": 100,
  "max_tokens_per_doc": 512,
  "batch_size": 2,
  "return_documents": false
}
'
{
  "detail": [
    {
      "loc": [
        "<string>"
      ],
      "msg": "<string>",
      "type": "<string>"
    }
  ]
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Headers

X-Provider
string
default:nugen-infer-align

Body

application/json

Rerank request schema for API v3

model
string
required

Model ID to use for reranking

Example:

"bge-reranker-v2-m3"

query
string
required

Search query

Example:

"What is Python?"

documents
(string | object)[]
required

Documents to rank

Example:
[
"Python is a programming language",
"Java is a language"
]
method
enum<string>
default:fast

Rerank method: fast (bi-encoder), standard (cross-encoder), optimal (hybrid bi+cross), best (subspace+attention)

Available options:
fast,
standard,
optimal,
best
model_cross
string | null

Cross-encoder model (required for standard/optimal methods)

top_n
integer | null

Return top N results

Required range: x >= 1
stage1_top_k
integer
default:100

Stage 1 top-k for hybrid methods (optimal/best)

Required range: x >= 1
max_tokens_per_doc
integer
default:512

Max tokens per document

Required range: x >= 1
batch_size
integer | null

Batch size for processing

Required range: x >= 1
return_documents
boolean
default:false

Include documents in response

Response

Return the reranked chunks based on similarity