Generate conversational responses that can process both text and images using vision-language models
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Generated multi-modal chat response with image understanding capabilities
The response is of type object
.