Text Models
Chat Completions API (Default Streaming)
Start a conversation using the OpenAI Chat Completions-compatible format and return model output via SSE streaming.
POST
Chat Completions API (Default Streaming)
Use a unified conversation format to call upstream models such as OpenAI, Claude, Gemini, DeepSeek, and Qwen. This document uses streaming output as the default, making it suitable for chat, Agent, and long-text generation scenarios where output needs to be displayed as it is generated.If
stream is not provided in this project’s route, it is handled as non-streaming. If you want to always receive a streaming response, explicitly pass "stream": true.Request Body
Model name. You can check the models available to the current API Key via Model List.
Conversation messages arranged in chronological order. Common roles are
system, user, assistant, and tool.Message content. A string indicates plain text; an array indicates multimodal content and supports
text, image_url, input_audio, file, and video_url.When set to
true, the response is text/event-stream, with each chunk pushed as data: and data: [DONE] returned at the end.Includes token usage statistics in the last message of the stream. Supported by only some upstream models.
Limits the maximum number of generated tokens. For some reasoning models, it is recommended to use
max_completion_tokens instead.Limits the maximum number of completion tokens, including reasoning tokens. Suitable for models that support reasoning.
Sampling temperature, commonly in the range
0 to 2. Lower values are more stable, higher values are more diverse.Nucleus sampling parameter, commonly in the range
0 to 1. It is generally not recommended to adjust temperature and top_p significantly at the same time.Function-calling tool list, compatible with OpenAI
tools.Controls whether the model calls tools. Common values are
auto, none, and required; you can also specify a particular function.Specifies the output format, such as
{ "type": "json_object" } or json_schema.Reasoning effort. Common values are
low, medium, and high; whether it takes effect depends on the model.Request Example
Multimodal Streaming
Response Example
Response Fields
The response ID generated for this request.
The streaming response is always
chat.completion.chunk.Incremental content. May include
role, content, reasoning_content, or tool_calls.Reason for completion. Common values are
stop, length, and tool_calls.Usage statistics. It is guaranteed to appear only when the upstream returns usage data and
stream_options.include_usage is enabled.