POST
/
inference
/
chat_text

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
messages
object[]
required

A list of messages comprising the conversation so far.

model
string
required

The name of the model to use.

max_tokens
integer | null
default:
2000

The maximum number of tokens to generate in the completion.

If the token count of your prompt (previous messages) plus max_tokens exceed the model's context length,max_tokens will be lowered to fit in the context window instead of returning an error.

prompt_truncate_len
integer | null
default:
1500

The size to which to truncate chat prompts. Earlier user/assistant messages will be evicted to fit the prompt into this length.

This should usually be set to a number << the max context size of the model, to allow enough remaining tokens for generating a response.

If omitted, you may receive "prompt too long" errors in your responses as conversations grow. Note that even with this set, you may still receive "prompt too long" errors if individual messages are too long for the model context window.

temperature
number | null
default:
1

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

We generally recommend altering this or top_p but not both.

Required range: 0 < x < 2

Response

200 - application/json
choices
object[]
required

The list of chat completion choices.

created
number
required

The Unix time in seconds when the response was generated.

id
string
required

A unique identifier of the response.

model
string
required

The model used for the chat completion.

object
enum<string>
default:
chat.completion

The object type, which is always "chat.completion".

Available options:
chat.completion
usage
object | null

Usage statistics.

For streaming responses, usage field is included in the very last response chunk returned.

Note that returning usage for streaming requests is a popular LLM API extension. If you use any popular LLM SDK, you might access the field directly even if it's not present in the type signature in the SDK.