Chat Text
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
A list of messages comprising the conversation so far.
The name of the model to use.
The maximum number of tokens to generate in the completion.
If the token count of your prompt (previous messages) plus max_tokens
exceed the model's context length,max_tokens
will be lowered to fit in the context window instead of returning an error.
The size to which to truncate chat prompts. Earlier user/assistant messages will be evicted to fit the prompt into this length.
This should usually be set to a number << the max context size of the model, to allow enough remaining tokens for generating a response.
If omitted, you may receive "prompt too long" errors in your responses as conversations grow. Note that even with this set, you may still receive "prompt too long" errors if individual messages are too long for the model context window.
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
We generally recommend altering this or top_p
but not both.
0 < x < 2
Response
The list of chat completion choices.
The Unix time in seconds when the response was generated.
A unique identifier of the response.
The model used for the chat completion.
The object type, which is always "chat.completion".
chat.completion
Usage statistics.
For streaming responses, usage
field is included in the very last response chunk returned.
Note that returning usage
for streaming requests is a popular LLM API extension. If you use any popular LLM SDK, you might access the field directly even if it's not present in the type signature in the SDK.
Was this page helpful?