> ## Documentation Index
> Fetch the complete documentation index at: https://docs.nugen.in/llms.txt
> Use this file to discover all available pages before exploring further.

# Generate Completions

> Generate text completions with streaming support.


This endpoint generates text completions using specified language models. Supports both streaming (real-time response chunks) and non-streaming modes. Ideal for text generation, continuation, and completion tasks.


**Request Body:**

- `model`: Model ID for text generation (required) - Base models (e.g., `nugen-flash-instruct`) or your aligned model ID
- `prompt`: Input to complete (required) - Can be:
  - Single string
  - List of strings
  - Array of integers (tokenized prompt)
  - Array of integer arrays (batch of tokenized prompts)
- `max_tokens` (optional): Maximum tokens to generate (default: 16, minimum: 0)
- `temperature` (optional): Sampling temperature between 0 and 2 (default: 1). Higher values like 0.8 make output more random, lower values like 0.2 make it more focused and deterministic
- `stream` (optional): Enable streaming responses (default: `false`)


**Optional Headers:**

- `X-Session-ID`: Session identifier for conversation tracking


**Returns:**

**Non-streaming mode** -
- `id`: Unique identifier for the response
- `object`: Object type (always `text_completion`)
- `created`: Unix timestamp when response was generated
- `model`: Model ID used for completion
- `choices`: List of completion choices, each containing:
  - `text`: Generated completion text
  - `index`: Index of this choice
  - `finish_reason`: Reason model stopped (`stop` for natural stop point, `length` if max tokens reached)
- `usage`: Token usage statistics:
  - `prompt_tokens`: Number of tokens in the prompt
  - `completion_tokens`: Number of tokens generated
  - `total_tokens`: Total tokens used (prompt + completion)
- `confidence_score` (optional): Confidence score from Domain-Aligned AI models

**Streaming mode** - Server-sent events (SSE) stream with `StreamingCompletionResponsev2` chunks containing:
- `id`: Response identifier
- `object`: Always `text_completion`
- `created`: Timestamp
- `model`: Model ID
- `choices`: Completion chunks
- `usage` (optional): Only in final chunk
- `confidence_score` (optional): Only in final chunk


**Example Request (Non-streaming):**

```json
POST /api/v3/inference/completions
Headers: {"Authorization": "Bearer <api_key>"}

{
  "model": "aligned-model-01kmqm4nrn9fw6r",
  "prompt": "Write a haiku about programming:",
  "max_tokens": 100,
  "temperature": 0.7
}
```


**Example Response (Non-streaming):**

```json
{
  "id": "nugen-abc123",
  "object": "text_completion",
  "created": 1704123600.0,
  "model": "aligned-model-01kmqm4nrn9fw6r",
  "choices": [
    {
      "text": "Code flows like water,
Bugs emerge, then disappear,
Debug and refine.",
      "index": 0,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 7,
    "completion_tokens": 18,
    "total_tokens": 25
  },
  "confidence_score": 86.5221
}
```


**Example Request (Streaming):**

```json
POST /api/v3/inference/completions
Headers: {"Authorization": "Bearer <api_key>"}

{
  "model": "aligned-model-01kmqm4nrn9fw6r",
  "prompt": "Write a haiku about programming:",
  "max_tokens": 100,
  "temperature": 0.7,
  "stream": true
}
```


**Example Response (Streaming):**

```
data: {"id":"nugen-aligned-model-01kmqm4nrn9fw6r-abc123","object":"text_completion","created":1704123600.0,"model":"nugen-flash-instruct","choices":[{"text":"Code","index":0,"finish_reason":null}]}

data: {"id":"nugen-aligned-model-01kmqm4nrn9fw6r-abc123","object":"text_completion","created":1704123600.0,"model":"nugen-flash-instruct","choices":[{"text":" flows","index":0,"finish_reason":null}]}

data: {"id":"nugen-aligned-model-01kmqm4nrn9fw6r-abc123","object":"text_completion","created":1704123600.0,"model":"nugen-flash-instruct","choices":[{"text":" like water","index":0,"finish_reason":"stop"}],"usage":{"prompt_tokens":7,"completion_tokens":18,"total_tokens":25},"confidence_score":86.5221}

data: [DONE]
```


**Example Request (Tokenized Prompt):**

```json
POST /api/v3/inference/completions
Headers: {"Authorization": "Bearer <api_key>"}

{
  "model": "aligned-model-01kmqm4nrn9fw6r",
  "prompt": [1014, 6766, 318],
  "max_tokens": 50,
  "temperature": 1
}
```


**Notes:**

- **Using Domain-Aligned Models**: Pass the `model_id` from your completed alignment project as the `model` parameter. Get your aligned model IDs from `GET /api/v3/alignment-project/status/{id}` or `GET /api/v3/models/aligned`
- Streaming provides real-time response generation for better user experience
- The `prompt` parameter accepts pre-tokenized inputs for advanced use cases
- Default `max_tokens` is 16 - adjust based on your needs
- Temperature controls randomness: 0 = deterministic, 2 = very random
- `confidence_score` is only available for Domain-Aligned AI models and indicates model certainty (0-100)
- Include `X-Session-ID` header for multi-turn conversation tracking
- Token usage is tracked and billed per request


## OpenAPI

````yaml https://api.nugen.in/openapi-public.json post /api/v3/inference/completions
openapi: 3.1.0
info:
  title: Nugen Intelligence API
  description: 'Nugen Intelligence : Powering Specialized Intelligence At Scale'
  version: 25.4.20
servers: []
security: []
paths:
  /api/v3/inference/completions:
    post:
      tags:
        - Inference
      summary: Generate Completions
      description: >-
        Generate text completions with streaming support.


        This endpoint generates text completions using specified language
        models. Supports both streaming (real-time response chunks) and
        non-streaming modes. Ideal for text generation, continuation, and
        completion tasks.


        **Request Body:**


        - `model`: Model ID for text generation (required) - Base models (e.g.,
        `nugen-flash-instruct`) or your aligned model ID

        - `prompt`: Input to complete (required) - Can be:
          - Single string
          - List of strings
          - Array of integers (tokenized prompt)
          - Array of integer arrays (batch of tokenized prompts)
        - `max_tokens` (optional): Maximum tokens to generate (default: 16,
        minimum: 0)

        - `temperature` (optional): Sampling temperature between 0 and 2
        (default: 1). Higher values like 0.8 make output more random, lower
        values like 0.2 make it more focused and deterministic

        - `stream` (optional): Enable streaming responses (default: `false`)


        **Optional Headers:**


        - `X-Session-ID`: Session identifier for conversation tracking


        **Returns:**


        **Non-streaming mode** -

        - `id`: Unique identifier for the response

        - `object`: Object type (always `text_completion`)

        - `created`: Unix timestamp when response was generated

        - `model`: Model ID used for completion

        - `choices`: List of completion choices, each containing:
          - `text`: Generated completion text
          - `index`: Index of this choice
          - `finish_reason`: Reason model stopped (`stop` for natural stop point, `length` if max tokens reached)
        - `usage`: Token usage statistics:
          - `prompt_tokens`: Number of tokens in the prompt
          - `completion_tokens`: Number of tokens generated
          - `total_tokens`: Total tokens used (prompt + completion)
        - `confidence_score` (optional): Confidence score from Domain-Aligned AI
        models


        **Streaming mode** - Server-sent events (SSE) stream with
        `StreamingCompletionResponsev2` chunks containing:

        - `id`: Response identifier

        - `object`: Always `text_completion`

        - `created`: Timestamp

        - `model`: Model ID

        - `choices`: Completion chunks

        - `usage` (optional): Only in final chunk

        - `confidence_score` (optional): Only in final chunk


        **Example Request (Non-streaming):**


        ```json

        POST /api/v3/inference/completions

        Headers: {"Authorization": "Bearer <api_key>"}


        {
          "model": "aligned-model-01kmqm4nrn9fw6r",
          "prompt": "Write a haiku about programming:",
          "max_tokens": 100,
          "temperature": 0.7
        }

        ```


        **Example Response (Non-streaming):**


        ```json

        {
          "id": "nugen-abc123",
          "object": "text_completion",
          "created": 1704123600.0,
          "model": "aligned-model-01kmqm4nrn9fw6r",
          "choices": [
            {
              "text": "Code flows like water,
        Bugs emerge, then disappear,

        Debug and refine.",
              "index": 0,
              "finish_reason": "stop"
            }
          ],
          "usage": {
            "prompt_tokens": 7,
            "completion_tokens": 18,
            "total_tokens": 25
          },
          "confidence_score": 86.5221
        }

        ```


        **Example Request (Streaming):**


        ```json

        POST /api/v3/inference/completions

        Headers: {"Authorization": "Bearer <api_key>"}


        {
          "model": "aligned-model-01kmqm4nrn9fw6r",
          "prompt": "Write a haiku about programming:",
          "max_tokens": 100,
          "temperature": 0.7,
          "stream": true
        }

        ```


        **Example Response (Streaming):**


        ```

        data:
        {"id":"nugen-aligned-model-01kmqm4nrn9fw6r-abc123","object":"text_completion","created":1704123600.0,"model":"nugen-flash-instruct","choices":[{"text":"Code","index":0,"finish_reason":null}]}


        data:
        {"id":"nugen-aligned-model-01kmqm4nrn9fw6r-abc123","object":"text_completion","created":1704123600.0,"model":"nugen-flash-instruct","choices":[{"text":"
        flows","index":0,"finish_reason":null}]}


        data:
        {"id":"nugen-aligned-model-01kmqm4nrn9fw6r-abc123","object":"text_completion","created":1704123600.0,"model":"nugen-flash-instruct","choices":[{"text":"
        like
        water","index":0,"finish_reason":"stop"}],"usage":{"prompt_tokens":7,"completion_tokens":18,"total_tokens":25},"confidence_score":86.5221}


        data: [DONE]

        ```


        **Example Request (Tokenized Prompt):**


        ```json

        POST /api/v3/inference/completions

        Headers: {"Authorization": "Bearer <api_key>"}


        {
          "model": "aligned-model-01kmqm4nrn9fw6r",
          "prompt": [1014, 6766, 318],
          "max_tokens": 50,
          "temperature": 1
        }

        ```


        **Notes:**


        - **Using Domain-Aligned Models**: Pass the `model_id` from your
        completed alignment project as the `model` parameter. Get your aligned
        model IDs from `GET /api/v3/alignment-project/status/{id}` or `GET
        /api/v3/models/aligned`

        - Streaming provides real-time response generation for better user
        experience

        - The `prompt` parameter accepts pre-tokenized inputs for advanced use
        cases

        - Default `max_tokens` is 16 - adjust based on your needs

        - Temperature controls randomness: 0 = deterministic, 2 = very random

        - `confidence_score` is only available for Domain-Aligned AI models and
        indicates model certainty (0-100)

        - Include `X-Session-ID` header for multi-turn conversation tracking

        - Token usage is tracked and billed per request
      operationId: generate_text_completions
      parameters:
        - name: X-Session-ID
          in: header
          required: false
          schema:
            anyOf:
              - type: string
              - type: 'null'
            title: X-Session-Id
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/CreateCompletionRequest_completionv2'
      responses:
        '200':
          description: >-
            Streaming text completion responses or complete response depending
            on stream parameter
          content:
            application/json:
              schema: {}
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
      security:
        - HTTPBearer: []
components:
  schemas:
    CreateCompletionRequest_completionv2:
      properties:
        model:
          type: string
          title: Model
          description: The name of the model to use.
          example: nugen-flash-instruct
        prompt:
          anyOf:
            - type: string
            - items:
                type: string
              type: array
            - items:
                type: integer
              type: array
            - items:
                items:
                  type: integer
                type: array
              type: array
          title: Prompt
          description: |-
            The prompt to generate completions for.
            It can be a single string or a list of strings.
            It can also be an array of integers or an array of integer arrays,
            which allows to pass already tokenized prompt.
          example: The sky is
        max_tokens:
          anyOf:
            - type: integer
              minimum: 0
            - type: 'null'
          title: Max Tokens
          description: The maximum number of tokens to generate in the completion.
          default: 16
          example: 400
        temperature:
          anyOf:
            - type: number
              maximum: 2
              minimum: 0
            - type: 'null'
          title: Temperature
          description: >-
            What sampling temperature to use, between 0 and 2. Higher values
            like 0.8 will make the output more random, while lower values like
            0.2 will make it more focused and deterministic.
          default: 1
          example: 1
        stream:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Stream
          description: Whether to stream back partial progress as server-sent events.
          default: false
        stream_options:
          anyOf:
            - additionalProperties: true
              type: object
            - type: 'null'
          title: Stream Options
          description: 'Options for streaming responses, e.g., {''include_usage'': true}'
      type: object
      required:
        - model
        - prompt
      title: CompletionsRequestV2
      example:
        max_tokens: 400
        model: nugen-flash-instruct
        prompt: The sky is
        stream: false
        temperature: 1
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError
  securitySchemes:
    HTTPBearer:
      type: http
      scheme: bearer

````