Skip to main content
POST
/
api
/
v3
/
inference
/
chat
/
completions
Generate Streaming Chat Completions
curl --request POST \
  --url https://api.nugen.in/api/v3/inference/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "nugen-flash-instruct",
  "messages": [
    {
      "role": "system",
      "content": "<string>",
      "name": "<string>"
    }
  ],
  "max_tokens": "2000",
  "prompt_truncate_len": 1500,
  "temperature": 1,
  "stream": false,
  "tools": [
    {}
  ],
  "tool_choice": "<string>",
  "top_p": 123,
  "top_k": 123,
  "n": 1,
  "reasoning": {
    "effort": "xhigh",
    "max_tokens": 123,
    "exclude": false,
    "enabled": true
  }
}
'
{
  "detail": [
    {
      "loc": [
        "<string>"
      ],
      "msg": "<string>",
      "type": "<string>"
    }
  ]
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Headers

X-Session-ID
string | null
X-Provider
string
default:fwai

Body

application/json
model
string
required

The name of the model to use.

Example:

"nugen-flash-instruct"

messages
ChatCompletionRequestMessage · object[]
required

A list of messages comprising the conversation so far.

Minimum array length: 1
max_tokens
integer | null
default:2000

The maximum number of tokens to generate in the completion.

prompt_truncate_len
integer | null
default:1500

The size to which to truncate chat prompts.

temperature
number | null
default:1

What sampling temperature to use, between 0 and 2.

Required range: 0 <= x <= 2
stream
boolean | null
default:false

Whether to stream back partial progress as server-sent events.

tools
Tools · object[] | null

List of tools/functions

tool_choice

'auto', 'none', or specific tool

top_p
number | null

Nucleus sampling

top_k
integer | null

Top-k sampling

n
integer | null
default:1

Number of completions

reasoning
ReasoningFields · object

Reasoning configuration for the model

Response

Streaming chat completion responses or complete response depending on stream parameter