Chat with an answer engine using RAG (Retrieval-Augmented Generation).
Path Parameters:
engine_id: Unique identifier of the answer engineRequest Body:
query: The user’s question (required)model_llm (optional): LLM model for generation. Default: llama-v3p3-70b-instructmode (optional): vanilla or agentic. Default: vanillatop_k (optional): Number of context chunks (1-30). Default: 5conv_thread_id (optional): Pass previous thread ID to continue a conversationprompt_id (optional): Specific prompt ID to use for generationReturns:
answer_engine_id: ID of the engine that processed the requestconv_thread_id: Thread identifier for follow-up questionsquery_id: Unique identifier for this queryquery: Echo of the submitted querymodel_llm: LLM model that produced the answeranswer: Generated answer from the LLMtoken_usage: Token usage statisticsRaises:
404: Engine not found400: Chat request failedExample Request:
POST /api/v3/answer-engine/ae_8c625d9d71/chat
Headers: {"Authorization": "Bearer <api_key>"}
{
"query": "What is machine learning?"
}
Example Response:
{
"answer_engine_id": "ae_8c625d9d71",
"conv_thread_id": "thread_9d9af946e7",
"query_id": "query_4afc22621f",
"query": "What is machine learning?",
"model_llm": "llama-v3p3-70b-instruct",
"answer": "Machine learning is a subset of artificial intelligence...",
"token_usage": {
"input_tokens": 312,
"output_tokens": 124
}
}
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Unique answer engine identifier
Request body for chatting with an answer engine using RAG.
User's question
LLM model for generation
Inference mode: vanilla or agentic
Number of context chunks to use (1–30)
1 <= x <= 30Thread ID for multi-turn conversation
Prompt ID to use for generation
Returns the generated answer along with supporting document chunks and token usage statistics
Response containing the generated answer, thread ID, and token usage for a chat request.
Token consumption breakdown for a single chat request.