Generate text completions with streaming support.
This endpoint generates text completions using specified language models. Supports both streaming (real-time response chunks) and non-streaming modes. Ideal for text generation, continuation, and completion tasks.
Request Body:
model: Model ID for text generation (required)prompt: Input to complete (required) - Can be:
max_tokens (optional): Maximum tokens to generate (default: 16, minimum: 0)temperature (optional): Sampling temperature between 0 and 2 (default: 1). Higher values like 0.8 make output more random, lower values like 0.2 make it more focused and deterministicstream (optional): Enable streaming responses (default: false)Optional Headers:
X-Session-ID: Session identifier for conversation trackingReturns:
Non-streaming mode -
id: Unique identifier for the responseobject: Object type (always text_completion)created: Unix timestamp when response was generatedmodel: Model ID used for completionchoices: List of completion choices, each containing:
text: Generated completion textindex: Index of this choicefinish_reason: Reason model stopped (stop for natural stop point, length if max tokens reached)usage: Token usage statistics:
prompt_tokens: Number of tokens in the promptcompletion_tokens: Number of tokens generatedtotal_tokens: Total tokens used (prompt + completion)confidence_score (optional): Confidence score from Domain-Aligned AI modelsStreaming mode - Server-sent events (SSE) stream with StreamingCompletionResponsev2 chunks containing:
id: Response identifierobject: Always text_completioncreated: Timestampmodel: Model IDchoices: Completion chunksusage (optional): Only in final chunkconfidence_score (optional): Only in final chunkExample Request (Non-streaming):
POST /api/v3/inference/completions
Headers: {"Authorization": "Bearer <api_key>"}
{
"model": "nugen-flash-instruct",
"prompt": "Write a haiku about programming:",
"max_tokens": 100,
"temperature": 0.7
}
Example Response (Non-streaming):
{
"id": "nugen-abc123",
"object": "text_completion",
"created": 1704123600.0,
"model": "nugen-flash-instruct",
"choices": [
{
"text": "Code flows like water,
Bugs emerge, then disappear,
Debug and refine.",
"index": 0,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 7,
"completion_tokens": 18,
"total_tokens": 25
},
"confidence_score": 86.5221
}
Example Request (Streaming):
POST /api/v3/inference/completions
Headers: {"Authorization": "Bearer <api_key>"}
{
"model": "nugen-flash-instruct",
"prompt": "Write a haiku about programming:",
"max_tokens": 100,
"temperature": 0.7,
"stream": true
}
Example Response (Streaming):
data: {"id":"nugen-abc123","object":"text_completion","created":1704123600.0,"model":"nugen-flash-instruct","choices":[{"text":"Code","index":0,"finish_reason":null}]}
data: {"id":"nugen-abc123","object":"text_completion","created":1704123600.0,"model":"nugen-flash-instruct","choices":[{"text":" flows","index":0,"finish_reason":null}]}
data: {"id":"nugen-abc123","object":"text_completion","created":1704123600.0,"model":"nugen-flash-instruct","choices":[{"text":" like water","index":0,"finish_reason":"stop"}],"usage":{"prompt_tokens":7,"completion_tokens":18,"total_tokens":25},"confidence_score":86.5221}
data: [DONE]
Example Request (Tokenized Prompt):
POST /api/v3/inference/completions
Headers: {"Authorization": "Bearer <api_key>"}
{
"model": "nugen-flash-instruct",
"prompt": [1014, 6766, 318],
"max_tokens": 50,
"temperature": 1
}
Notes:
prompt parameter accepts pre-tokenized inputs for advanced use casesmax_tokens is 16 - adjust based on your needsconfidence_score is only available for Domain-Aligned AI modelsX-Session-ID header for multi-turn conversation trackingBearer authentication header of the form Bearer <token>, where <token> is your auth token.
The name of the model to use.
"nugen-flash-instruct"
The prompt to generate completions for. It can be a single string or a list of strings. It can also be an array of integers or an array of integer arrays, which allows to pass already tokenized prompt.
"The sky is"
The maximum number of tokens to generate in the completion.
x >= 0400
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
0 <= x <= 21
Whether to stream back partial progress as server-sent events.
Streaming text completion responses or complete response depending on stream parameter