Video Model Support Matrix

This table summarizes the primary entry points, common modes, and reference image passing methods for the current video model series.

Model Summary Table

Model Family	Representative Models	Documentation Page	Recommended Entry Point	Common Modes	Reference Image Passing
Sora	`sora-2`	Overview	`/v1/videos`	Text-to-video, first-frame-to-video	JSON `input_reference`
Veo	`veo_3_1`, `veo_3_1-fast`	Overview	`/v1/videos`	Text-to-video, first and last frames, reference-to-video	JSON `input_reference`
Grok Video	`grok-video-3`, `grok-video-3-pro`, `grok-video-3-max`	Overview	`/v1/videos`	Text-to-video, first-frame-to-video, first and last frames, reference-to-video	multipart `input_reference`
Domestic Video Models (AIGC)	`Vidu-`, `Kling-`, `jimeng-video-`, `GV-`, `OS-`, `Hunyuan-`, `Mingmou-`, `Hailuo-`, `SV-`, `JV-`	Overview	`/v1/videos`	Text-to-video, image-to-video, reference images, reference videos, first and last frames, motion control, digital human, lip sync, template effects	JSON `image` / `images` / `input_reference` / `metadata`
Seedance-2	`doubao-seedance-2-0-260128`, `doubao-seedance-2-0-fast-260128`	Overview	`/v1/video/generations`, assets `/v1/seedance/asset/*`	Text-to-video, first and last frames, multimodal reference, asset library `asset://`	JSON `content` + `metadata`

Family-by-Family Notes

Sora

Item	Description
Model example	`sora-2`
Recommended entry point	`POST /v1/videos`
Documentation page	Sora Video Overview
Common fields	`prompt`, `size`, `seconds`, `input_reference`, `metadata`
Reference image	JSON `input_reference`; the gateway still supports multipart
Aspect ratio	Commonly `16:9`, `9:16`
Duration	Submitted via `seconds`; the specific available values depend on the current upstream and channel configuration

Veo

Item	Description
Model examples	`veo_3_1`, `veo_3_1-fast`
Recommended entry point	`POST /v1/videos`
Documentation page	Veo Video Overview
Common modes	Text-to-video, first and last frames, reference-to-video
Reference image	JSON `input_reference`, converted by the server into the Veo request structure
Notes	In reference-to-video mode, the request is prioritized toward a landscape orientation to avoid upstream incompatibility

Domestic Video Models (AIGC)

Item	Description
Model examples	`Vidu-`, `Kling-`, `jimeng-video-`, `GV-`, `OS-`, `Hunyuan-`, `Mingmou-`, `Hailuo-`, `SV-`, `JV-`
Recommended entry point	`POST /v1/videos`
Documentation page	Domestic Video Model Overview
Common fields	`model`, `prompt`, `seconds`, `duration`, `size`, `image`, `images`, `metadata`
Reference image	Supports `image`, `images`, `input_reference`; for advanced scenarios, you can also pass it via `metadata.file_infos`
Typical scenarios	Text-to-video, image-to-video, reference images, reference videos, first and last frames, motion control, digital human, lip sync, template effects

Grok Video

Item	Description
Model examples	`grok-video-3`, `grok-video-3-pro`, `grok-video-3-max`
Recommended entry point	`POST /v1/videos`
Documentation page	Grok Video Overview
Common fields	`prompt`, `seconds`, `aspect_ratio`, `size`
Reference image	Sent via multipart `input_reference`, supports multiple images
Duration rules	`grok-video-3-pro` is fixed at `10s`; `grok-video-3-max` is fixed at `15s`
Special mode	Also supports a combined mode of “first-frame-to-video + reference image”

Seedance-2

Item	Description
Model examples	`doubao-seedance-2-0-260128`, `doubao-seedance-2-0-fast-260128`
Recommended entry point	`POST /v1/video/generations`; assets `POST /v1/seedance/asset/*`
Documentation page	Seedance-2 Overview
Common fields	`content[]` (`text` / `image_url` / `video_url` / `audio_url` + `role`), `metadata.duration`, `metadata.ratio`, `metadata.resolution`
Asset reference	Use `asset://{assetId}` after upload
Query	`GET /v1/video/generations/{task_id}`

Doubao Seedance

Item	Description
Model examples	`doubao-seedance-1-5-pro_480p`, `doubao-seedance-1-5-pro_720p`, `doubao-seedance-1-5-pro_1080p`
Recommended entry point	`POST /v1/videos`
Documentation page	Domestic Video Model Overview
Common fields	`prompt`, `seconds`, `size`
Reference image	multipart `first_frame_image`, `last_frame_image`
Duration rules	The current duration limit is between `4` and `11` seconds
Notes	Not suitable for “reference-to-video” mode

Alibaba wan2.6

Item	Description
Model examples	`wan2.6-t2v:1280720`, `wan2.6-t2v:19201080`, `wan2.6-i2v:1280720`, `wan2.6-i2v:19201080`
Recommended entry point	`POST /v1/videos`
Documentation page	Domestic Video Model Overview
Common modes	`t2v` text-to-video, `i2v` image-to-video
Resolution	The model name already includes a fixed resolution tier
Reference image	`i2v` is commonly a single-image input

Vidu

Item	Description
Model examples	`Vidu-q3-pro`, `Vidu-q3-turbo`
Recommended entry point	`POST /v1/videos`
Documentation page	Domestic Video Model Overview
Request style	JSON
First-frame image	`image`
First and last frames	`image` + `metadata.last_frame_url`
Reference-to-video	`images`, commonly up to 3 images

Kling

Item	Description
Model examples	`Kling-3.0`, `Kling-3.0-Omni`
Recommended entry point	`POST /v1/videos` or the official compatible route `/kling/v1/videos/*`
Documentation page	Domestic Video Model Overview
Request style	JSON
Common fields	`prompt`, `seconds`, `metadata.output_config`
Reference image	`image`
Audio	Can be controlled via `metadata.output_config.audio_generation`

Jimeng Video

Item	Description
Model examples	`jimeng-video-3.0`, `jimeng-video-2.0`
Recommended entry point	`POST /v1/videos` (OpenAI format), `POST /v1/video/create` (unified video)
Documentation page	Domestic Video Model Overview, Jimeng Video Overview
Request style	JSON / multipart/form-data
Common fields	`model`, `prompt`, `seconds`, `size`, `input_reference` (OpenAI format); `images`, `aspect_ratio`, `size` (unified video)
Reference image	OpenAI format: `input_reference` file upload; unified video: `images` array
Integration modes	OpenAI format, unified video, Doubao channel
Typical scenarios	Text-to-video, image-to-video, first and last frames to video

Hailuo

Item	Description
Model examples	`Hailuo-2.3`, `Hailuo-2.3-fast`
Recommended entry point	`POST /v1/videos`
Documentation page	Domestic Video Model Overview
Request style	JSON
Common fields	`prompt`, `seconds`, `metadata.output_config.resolution`
Reference image	`image`
Notes	Do not rely on `aspect_ratio`; it is currently more suitable for text-to-video and first-frame-to-video

Supported Generation Modes

Model Family	Text-to-video	First-frame-to-video	First and last frames	Reference-to-video	Audio toggle
Sora	Supported	Supported	Some scenarios depend on upstream	Some scenarios are implemented through multi-image reference	Supported
Veo	Supported	Can be implemented via reference image	Supported	Supported	Depends on upstream
Grok Video	Supported	Supported	Supported	Supported	Depends on upstream
Doubao Seedance	Supported	Supported	Supported	Not recommended	Depends on upstream
Alibaba wan2.6	Supported	`i2v` supported	Depends on upstream	Depends on upstream	Depends on upstream
Jimeng Video	Supported	Supported	Supported	Supported	Depends on upstream
Vidu	Supported	Supported	Supported	Supported	Depends on upstream
Kling	Supported	Supported	Not currently recommended as a standard capability commitment	Not recommended	Supported
Hailuo	Supported	Supported	Not recommended	Not recommended	Depends on upstream

Recommended Reading

Grok Video Overview
Sora Video Overview
Veo Video Overview
If you want to implement image-to-video, first confirm whether the target model expects image, images, input_reference, or first_frame_image / last_frame_image.
If you are explicitly integrating in Kling’s official format, then refer to this set of routes: /kling/v1/videos/*.

Video Series Overview Seedance-2 Overview