Generate Completions
Generate text completions with streaming support.
This endpoint generates text completions using specified language models. Supports both streaming (real-time response chunks) and non-streaming modes. Ideal for text generation, continuation, and completion tasks.
Request Body:
model: Model ID for text generation (required) - Base models (e.g.,nugen-flash-instruct) or your aligned model IDprompt: Input to complete (required) - Can be:- Single string
- List of strings
- Array of integers (tokenized prompt)
- Array of integer arrays (batch of tokenized prompts)
max_tokens(optional): Maximum tokens to generate (default: 16, minimum: 0)temperature(optional): Sampling temperature between 0 and 2 (default: 1). Higher values like 0.8 make output more random, lower values like 0.2 make it more focused and deterministicstream(optional): Enable streaming responses (default:false)
Optional Headers:
X-Session-ID: Session identifier for conversation tracking
Returns:
Non-streaming mode -
id: Unique identifier for the responseobject: Object type (alwaystext_completion)created: Unix timestamp when response was generatedmodel: Model ID used for completionchoices: List of completion choices, each containing:text: Generated completion textindex: Index of this choicefinish_reason: Reason model stopped (stopfor natural stop point,lengthif max tokens reached)
usage: Token usage statistics:prompt_tokens: Number of tokens in the promptcompletion_tokens: Number of tokens generatedtotal_tokens: Total tokens used (prompt + completion)
confidence_score(optional): Confidence score from Domain-Aligned AI models
Streaming mode - Server-sent events (SSE) stream with StreamingCompletionResponsev2 chunks containing:
id: Response identifierobject: Alwaystext_completioncreated: Timestampmodel: Model IDchoices: Completion chunksusage(optional): Only in final chunkconfidence_score(optional): Only in final chunk
Example Request (Non-streaming):
POST /api/v3/inference/completions
Headers: {"Authorization": "Bearer <api_key>"}
{
"model": "aligned-model-01kmqm4nrn9fw6r",
"prompt": "Write a haiku about programming:",
"max_tokens": 100,
"temperature": 0.7
}
Example Response (Non-streaming):
{
"id": "nugen-abc123",
"object": "text_completion",
"created": 1704123600.0,
"model": "aligned-model-01kmqm4nrn9fw6r",
"choices": [
{
"text": "Code flows like water,
Bugs emerge, then disappear,
Debug and refine.",
"index": 0,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 7,
"completion_tokens": 18,
"total_tokens": 25
},
"confidence_score": 86.5221
}
Example Request (Streaming):
POST /api/v3/inference/completions
Headers: {"Authorization": "Bearer <api_key>"}
{
"model": "aligned-model-01kmqm4nrn9fw6r",
"prompt": "Write a haiku about programming:",
"max_tokens": 100,
"temperature": 0.7,
"stream": true
}
Example Response (Streaming):
data: {"id":"nugen-aligned-model-01kmqm4nrn9fw6r-abc123","object":"text_completion","created":1704123600.0,"model":"nugen-flash-instruct","choices":[{"text":"Code","index":0,"finish_reason":null}]}
data: {"id":"nugen-aligned-model-01kmqm4nrn9fw6r-abc123","object":"text_completion","created":1704123600.0,"model":"nugen-flash-instruct","choices":[{"text":" flows","index":0,"finish_reason":null}]}
data: {"id":"nugen-aligned-model-01kmqm4nrn9fw6r-abc123","object":"text_completion","created":1704123600.0,"model":"nugen-flash-instruct","choices":[{"text":" like water","index":0,"finish_reason":"stop"}],"usage":{"prompt_tokens":7,"completion_tokens":18,"total_tokens":25},"confidence_score":86.5221}
data: [DONE]
Example Request (Tokenized Prompt):
POST /api/v3/inference/completions
Headers: {"Authorization": "Bearer <api_key>"}
{
"model": "aligned-model-01kmqm4nrn9fw6r",
"prompt": [1014, 6766, 318],
"max_tokens": 50,
"temperature": 1
}
Notes:
- Using Domain-Aligned Models: Pass the
model_idfrom your completed alignment project as themodelparameter. Get your aligned model IDs fromGET /api/v3/alignment-project/status/{id}orGET /api/v3/models/aligned - Streaming provides real-time response generation for better user experience
- The
promptparameter accepts pre-tokenized inputs for advanced use cases - Default
max_tokensis 16 - adjust based on your needs - Temperature controls randomness: 0 = deterministic, 2 = very random
confidence_scoreis only available for Domain-Aligned AI models and indicates model certainty (0-100)- Include
X-Session-IDheader for multi-turn conversation tracking - Token usage is tracked and billed per request
Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Headers
Body
The name of the model to use.
"nugen-flash-instruct"
The prompt to generate completions for. It can be a single string or a list of strings. It can also be an array of integers or an array of integer arrays, which allows to pass already tokenized prompt.
"The sky is"
The maximum number of tokens to generate in the completion.
x >= 0400
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
0 <= x <= 21
Whether to stream back partial progress as server-sent events.
Options for streaming responses, e.g., {'include_usage': true}
Response
Streaming text completion responses or complete response depending on stream parameter