Video Model Support Matrix
This table summarizes the primary entry points, common modes, and reference image passing methods for the current video model series.Model Summary Table
| Model Family | Representative Models | Documentation Page | Recommended Entry Point | Common Modes | Reference Image Passing |
|---|---|---|---|---|---|
| Sora | sora-2 | Overview | /v1/videos | Text-to-video, first-frame-to-video | JSON input_reference |
| Veo | veo_3_1, veo_3_1-fast | Overview | /v1/videos | Text-to-video, first and last frames, reference-to-video | JSON input_reference |
| Grok Video | grok-video-3, grok-video-3-pro, grok-video-3-max | Overview | /v1/videos | Text-to-video, first-frame-to-video, first and last frames, reference-to-video | multipart input_reference |
| Domestic Video Models (AIGC) | Vidu-*, Kling-*, jimeng-video-*, GV-*, OS-*, Hunyuan-*, Mingmou-*, Hailuo-*, SV-*, JV-* | Overview | /v1/videos | Text-to-video, image-to-video, reference images, reference videos, first and last frames, motion control, digital human, lip sync, template effects | JSON image / images / input_reference / metadata |
| Seedance-2 | doubao-seedance-2-0-260128, doubao-seedance-2-0-fast-260128 | Overview | /v1/video/generations, assets /v1/seedance/asset/* | Text-to-video, first and last frames, multimodal reference, asset library asset:// | JSON content + metadata |
Family-by-Family Notes
Sora
| Item | Description |
|---|---|
| Model example | sora-2 |
| Recommended entry point | POST /v1/videos |
| Documentation page | Sora Video Overview |
| Common fields | prompt, size, seconds, input_reference, metadata |
| Reference image | JSON input_reference; the gateway still supports multipart |
| Aspect ratio | Commonly 16:9, 9:16 |
| Duration | Submitted via seconds; the specific available values depend on the current upstream and channel configuration |
Veo
| Item | Description |
|---|---|
| Model examples | veo_3_1, veo_3_1-fast |
| Recommended entry point | POST /v1/videos |
| Documentation page | Veo Video Overview |
| Common modes | Text-to-video, first and last frames, reference-to-video |
| Reference image | JSON input_reference, converted by the server into the Veo request structure |
| Notes | In reference-to-video mode, the request is prioritized toward a landscape orientation to avoid upstream incompatibility |
Domestic Video Models (AIGC)
| Item | Description |
|---|---|
| Model examples | Vidu-*, Kling-*, jimeng-video-*, GV-*, OS-*, Hunyuan-*, Mingmou-*, Hailuo-*, SV-*, JV-* |
| Recommended entry point | POST /v1/videos |
| Documentation page | Domestic Video Model Overview |
| Common fields | model, prompt, seconds, duration, size, image, images, metadata |
| Reference image | Supports image, images, input_reference; for advanced scenarios, you can also pass it via metadata.file_infos |
| Typical scenarios | Text-to-video, image-to-video, reference images, reference videos, first and last frames, motion control, digital human, lip sync, template effects |
Grok Video
| Item | Description |
|---|---|
| Model examples | grok-video-3, grok-video-3-pro, grok-video-3-max |
| Recommended entry point | POST /v1/videos |
| Documentation page | Grok Video Overview |
| Common fields | prompt, seconds, aspect_ratio, size |
| Reference image | Sent via multipart input_reference, supports multiple images |
| Duration rules | grok-video-3-pro is fixed at 10s; grok-video-3-max is fixed at 15s |
| Special mode | Also supports a combined mode of “first-frame-to-video + reference image” |
Seedance-2
| Item | Description |
|---|---|
| Model examples | doubao-seedance-2-0-260128, doubao-seedance-2-0-fast-260128 |
| Recommended entry point | POST /v1/video/generations; assets POST /v1/seedance/asset/* |
| Documentation page | Seedance-2 Overview |
| Common fields | content[] (text / image_url / video_url / audio_url + role), metadata.duration, metadata.ratio, metadata.resolution |
| Asset reference | Use asset://{assetId} after upload |
| Query | GET /v1/video/generations/{task_id} |
Doubao Seedance
| Item | Description |
|---|---|
| Model examples | doubao-seedance-1-5-pro_480p, doubao-seedance-1-5-pro_720p, doubao-seedance-1-5-pro_1080p |
| Recommended entry point | POST /v1/videos |
| Documentation page | Domestic Video Model Overview |
| Common fields | prompt, seconds, size |
| Reference image | multipart first_frame_image, last_frame_image |
| Duration rules | The current duration limit is between 4 and 11 seconds |
| Notes | Not suitable for “reference-to-video” mode |
Alibaba wan2.6
| Item | Description |
|---|---|
| Model examples | wan2.6-t2v:1280*720, wan2.6-t2v:1920*1080, wan2.6-i2v:1280*720, wan2.6-i2v:1920*1080 |
| Recommended entry point | POST /v1/videos |
| Documentation page | Domestic Video Model Overview |
| Common modes | t2v text-to-video, i2v image-to-video |
| Resolution | The model name already includes a fixed resolution tier |
| Reference image | i2v is commonly a single-image input |
Vidu
| Item | Description |
|---|---|
| Model examples | Vidu-q3-pro, Vidu-q3-turbo |
| Recommended entry point | POST /v1/videos |
| Documentation page | Domestic Video Model Overview |
| Request style | JSON |
| First-frame image | image |
| First and last frames | image + metadata.last_frame_url |
| Reference-to-video | images, commonly up to 3 images |
Kling
| Item | Description |
|---|---|
| Model examples | Kling-3.0, Kling-3.0-Omni |
| Recommended entry point | POST /v1/videos or the official compatible route /kling/v1/videos/* |
| Documentation page | Domestic Video Model Overview |
| Request style | JSON |
| Common fields | prompt, seconds, metadata.output_config |
| Reference image | image |
| Audio | Can be controlled via metadata.output_config.audio_generation |
Jimeng Video
| Item | Description |
|---|---|
| Model examples | jimeng-video-3.0, jimeng-video-2.0 |
| Recommended entry point | POST /v1/videos (OpenAI format), POST /v1/video/create (unified video) |
| Documentation page | Domestic Video Model Overview, Jimeng Video Overview |
| Request style | JSON / multipart/form-data |
| Common fields | model, prompt, seconds, size, input_reference (OpenAI format); images, aspect_ratio, size (unified video) |
| Reference image | OpenAI format: input_reference file upload; unified video: images array |
| Integration modes | OpenAI format, unified video, Doubao channel |
| Typical scenarios | Text-to-video, image-to-video, first and last frames to video |
Hailuo
| Item | Description |
|---|---|
| Model examples | Hailuo-2.3, Hailuo-2.3-fast |
| Recommended entry point | POST /v1/videos |
| Documentation page | Domestic Video Model Overview |
| Request style | JSON |
| Common fields | prompt, seconds, metadata.output_config.resolution |
| Reference image | image |
| Notes | Do not rely on aspect_ratio; it is currently more suitable for text-to-video and first-frame-to-video |
Supported Generation Modes
| Model Family | Text-to-video | First-frame-to-video | First and last frames | Reference-to-video | Audio toggle |
|---|---|---|---|---|---|
| Sora | Supported | Supported | Some scenarios depend on upstream | Some scenarios are implemented through multi-image reference | Supported |
| Veo | Supported | Can be implemented via reference image | Supported | Supported | Depends on upstream |
| Grok Video | Supported | Supported | Supported | Supported | Depends on upstream |
| Doubao Seedance | Supported | Supported | Supported | Not recommended | Depends on upstream |
| Alibaba wan2.6 | Supported | i2v supported | Depends on upstream | Depends on upstream | Depends on upstream |
| Jimeng Video | Supported | Supported | Supported | Supported | Depends on upstream |
| Vidu | Supported | Supported | Supported | Supported | Depends on upstream |
| Kling | Supported | Supported | Not currently recommended as a standard capability commitment | Not recommended | Supported |
| Hailuo | Supported | Supported | Not recommended | Not recommended | Depends on upstream |
Recommended Reading
- Grok Video Overview
- Sora Video Overview
- Veo Video Overview
- If you want to implement image-to-video, first confirm whether the target model expects
image,images,input_reference, orfirst_frame_image/last_frame_image. - If you are explicitly integrating in Kling’s official format, then refer to this set of routes:
/kling/v1/videos/*.