Generate conversational responses with streaming support and multimodal capabilities.
This endpoint generates chat completions using language models, supporting both text-only conversations and vision inputs. Includes support for function calling, streaming responses, and automatic conversation tracking.
Request Body:
model: Model ID (required) - Base models (e.g., nugen-flash-instruct) or your aligned model ID (e.g., model_customer_support_alignment_01kjy6s8n9r8cnx)messages: Array of message objects (required, minimum 1), each containing:
role: Message role (system, user, or assistant)content: Text string OR array of content objects (for multimodal)
{"type": "text", "text": "your message"}{"type": "image_url", "image_url": {"url": "https://..."}}{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}name (optional): Author name (a-z, A-Z, 0-9, underscores, max 64 chars)max_tokens (optional): Maximum tokens to generate in completionprompt_truncate_len (optional): Size to truncate chat prompts (default: 1500)temperature (optional): Sampling temperature between 0 and 2 (default: 1)stream (optional): Enable streaming responses (default: false)tools (optional): List of tools/functions available to the modeltool_choice (optional): Tool selection mode (auto, none, or specific tool)top_p (optional): Nucleus sampling parametertop_k (optional): Top-k sampling parametern (optional): Number of completions to generate (default: 1)reasoning (optional): Reasoning configuration object with:
effort: Reasoning effort level (xhigh, high, medium, low, minimal, none)max_tokens: Token limit for reasoningexclude: Set true to exclude reasoning tokens from responseenabled: Enable reasoning with default parametersOptional Headers:
X-Session-ID: Session identifier for multi-turn conversation trackingReturns:
Non-streaming mode -
id: Unique identifier for the responseobject: Object type (always chat.completion)created: Unix timestamp when completion was createdmodel: Model used for chat completionchoices: List of completion choices, each containing:
index: Index of the choicemessage: Response message with:
role: Role of the author (always assistant)content: Generated response texttool_calls (optional): Tool calls made by the model (for function calling)finish_reason: Reason model stopped (stop for natural stop, length if max tokens reached)usage: Token usage statistics:
prompt_tokens: Number of tokens in the promptcompletion_tokens: Number of tokens generatedtotal_tokens: Total tokens used (prompt + completion)confidence_score (optional): Confidence score from Domain-Aligned AI modelsStreaming mode - ChatCompletionChunk stream with:
id: Unique identifiercreated: Timestampmodel: Model IDchoices: List of chunk choices with:
index: Choice indexdelta: Delta content with role and contentfinish_reason: Reason for stopping (only in final chunk)usage (optional): Only present in final chunkExample Request (Text Chat):
POST /api/v3/inference/chat/completions
Headers: {"Authorization": "Bearer <api_key>"}
{
"model": "nugen-flash-instruct",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the capital of France?"
}
],
"max_tokens": 500,
"temperature": 0.7
}
Example Response (Text Chat):
{
"id": "nugen-abc123",
"object": "chat.completion",
"created": 1704123600.0,
"model": "nugen-flash-instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 20,
"completion_tokens": 8,
"total_tokens": 28
},
"confidence_score": 89.5454
}
Example Request (Using Domain-Aligned Model):
POST /api/v3/inference/chat/completions
Headers: {"Authorization": "Bearer <api_key>"}
{
"model": "aligned-model-01kmqm4nrn9fw6r",
"messages": [
{
"role": "user",
"content": "How do I return a product?"
}
],
"max_tokens": 500,
"temperature": 0.7
}
Example Response (Domain-Aligned Model):
{
"id": "nugen-abc456",
"object": "chat.completion",
"created": 1704123700.0,
"model": "model_customer_support_alignment_01kjy6s8n9r8cnx",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "To return a product, please visit our returns portal within 30 days of purchase with your order number and receipt."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 25,
"total_tokens": 40
},
"confidence_score": 95.8
}
How to get your aligned model ID:
POST /api/v3/alignment-project/createGET /api/v3/alignment-project/status/{id} until status is COMPLETEDmodel_id field from the response (e.g., "model_customer_support_alignment_01kjy6s8n9r8cnx")POST /api/v3/models/deploy-model/{model_id} (optional - for evaluation/production use)model_id in the model parameter for inference requestsExample Request (Vision - Image URL):
POST /api/v3/inference/chat/completions
Headers: {"Authorization": "Bearer <api_key>"}
{
"model": "aligned-model-01kmqm4nrn9fw6r",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe what you see in this image."
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg"
}
}
]
}
],
"max_tokens": 1000
}
Example Request (Vision - Base64 Image):
POST /api/v3/inference/chat/completions
Headers: {"Authorization": "Bearer <api_key>"}
{
"model": "aligned-model-01kmqm4nrn9fw6r",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEAYABgAAD..."
}
}
]
}
]
}
Example Request (Function Calling):
POST /api/v3/inference/chat/completions
Headers: {"Authorization": "Bearer <api_key>"}
{
"model": "aligned-model-01kmqm4nrn9fw6r",
"messages": [
{"role": "user", "content": "What's the weather in Boston?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
}
}
}
}
],
"tool_choice": "auto"
}
Example Request (Streaming):
POST /api/v3/inference/chat/completions
Headers: {"Authorization": "Bearer <api_key>"}
{
"model": "aligned-model-01kmqm4nrn9fw6r",
"messages": [{"role": "user", "content": "Tell me a story"}],
"stream": true
}
Example Response (Streaming):
data: {"id":"nugen-abc123","created":1704123600.0,"model":"nugen-flash-instruct","choices":[{"index":0,"delta":{"role":"assistant","content":"Once"},"finish_reason":null}]}
data: {"id":"nugen-abc123","created":1704123600.0,"model":"nugen-flash-instruct","choices":[{"index":0,"delta":{"content":" upon"},"finish_reason":null}]}
data: {"id":"nugen-abc123","created":1704123600.0,"model":"nugen-flash-instruct","choices":[{"index":0,"delta":{"content":" a time"},"finish_reason":"stop"}],"usage":{"prompt_tokens":10,"completion_tokens":5,"total_tokens":15}}
data: [DONE]
Notes:
model_id from your completed alignment project as the model parameter. Get your aligned model IDs from GET /api/v3/alignment-project/status/{id} or GET /api/v3/models/alignedtools and tool_choice parametersconfidence_score is only available for Domain-Aligned AI models and indicates model certainty (0-100)X-Session-ID header for multi-turn conversation trackingprompt_truncate_len to control context window usage for long conversationsreasoning parameter enables advanced reasoning capabilities for supported modelsBearer authentication header of the form Bearer <token>, where <token> is your auth token.
The name of the model to use.
"nugen-flash-instruct"
A list of messages comprising the conversation so far.
1The maximum number of tokens to generate in the completion.
The size to which to truncate chat prompts.
What sampling temperature to use, between 0 and 2.
0 <= x <= 2Whether to stream back partial progress as server-sent events.
List of tools/functions
'auto', 'none', or specific tool
Nucleus sampling
Top-k sampling
Number of completions
Reasoning configuration for the model
Options for streaming responses, e.g., {'include_usage': true}
Streaming chat completion responses or complete response depending on stream parameter