Video Model Support Matrix

This table summarizes the primary entry points, common modes, and reference image passing methods for the current video model series.

Model Summary Table

Model FamilyRepresentative ModelsDocumentation PageRecommended Entry PointCommon ModesReference Image Passing
Sorasora-2Overview/v1/videosText-to-video, first-frame-to-videoJSON input_reference
Veoveo_3_1, veo_3_1-fastOverview/v1/videosText-to-video, first and last frames, reference-to-videoJSON input_reference
Grok Videogrok-video-3, grok-video-3-pro, grok-video-3-maxOverview/v1/videosText-to-video, first-frame-to-video, first and last frames, reference-to-videomultipart input_reference
Domestic Video Models (AIGC)Vidu-*, Kling-*, jimeng-video-*, GV-*, OS-*, Hunyuan-*, Mingmou-*, Hailuo-*, SV-*, JV-*Overview/v1/videosText-to-video, image-to-video, reference images, reference videos, first and last frames, motion control, digital human, lip sync, template effectsJSON image / images / input_reference / metadata
Seedance-2doubao-seedance-2-0-260128, doubao-seedance-2-0-fast-260128Overview/v1/video/generations, assets /v1/seedance/asset/*Text-to-video, first and last frames, multimodal reference, asset library asset://JSON content + metadata

Family-by-Family Notes

Sora

ItemDescription
Model examplesora-2
Recommended entry pointPOST /v1/videos
Documentation pageSora Video Overview
Common fieldsprompt, size, seconds, input_reference, metadata
Reference imageJSON input_reference; the gateway still supports multipart
Aspect ratioCommonly 16:9, 9:16
DurationSubmitted via seconds; the specific available values depend on the current upstream and channel configuration

Veo

ItemDescription
Model examplesveo_3_1, veo_3_1-fast
Recommended entry pointPOST /v1/videos
Documentation pageVeo Video Overview
Common modesText-to-video, first and last frames, reference-to-video
Reference imageJSON input_reference, converted by the server into the Veo request structure
NotesIn reference-to-video mode, the request is prioritized toward a landscape orientation to avoid upstream incompatibility

Domestic Video Models (AIGC)

ItemDescription
Model examplesVidu-*, Kling-*, jimeng-video-*, GV-*, OS-*, Hunyuan-*, Mingmou-*, Hailuo-*, SV-*, JV-*
Recommended entry pointPOST /v1/videos
Documentation pageDomestic Video Model Overview
Common fieldsmodel, prompt, seconds, duration, size, image, images, metadata
Reference imageSupports image, images, input_reference; for advanced scenarios, you can also pass it via metadata.file_infos
Typical scenariosText-to-video, image-to-video, reference images, reference videos, first and last frames, motion control, digital human, lip sync, template effects

Grok Video

ItemDescription
Model examplesgrok-video-3, grok-video-3-pro, grok-video-3-max
Recommended entry pointPOST /v1/videos
Documentation pageGrok Video Overview
Common fieldsprompt, seconds, aspect_ratio, size
Reference imageSent via multipart input_reference, supports multiple images
Duration rulesgrok-video-3-pro is fixed at 10s; grok-video-3-max is fixed at 15s
Special modeAlso supports a combined mode of “first-frame-to-video + reference image”

Seedance-2

ItemDescription
Model examplesdoubao-seedance-2-0-260128, doubao-seedance-2-0-fast-260128
Recommended entry pointPOST /v1/video/generations; assets POST /v1/seedance/asset/*
Documentation pageSeedance-2 Overview
Common fieldscontent[] (text / image_url / video_url / audio_url + role), metadata.duration, metadata.ratio, metadata.resolution
Asset referenceUse asset://{assetId} after upload
QueryGET /v1/video/generations/{task_id}

Doubao Seedance

ItemDescription
Model examplesdoubao-seedance-1-5-pro_480p, doubao-seedance-1-5-pro_720p, doubao-seedance-1-5-pro_1080p
Recommended entry pointPOST /v1/videos
Documentation pageDomestic Video Model Overview
Common fieldsprompt, seconds, size
Reference imagemultipart first_frame_image, last_frame_image
Duration rulesThe current duration limit is between 4 and 11 seconds
NotesNot suitable for “reference-to-video” mode

Alibaba wan2.6

ItemDescription
Model exampleswan2.6-t2v:1280*720, wan2.6-t2v:1920*1080, wan2.6-i2v:1280*720, wan2.6-i2v:1920*1080
Recommended entry pointPOST /v1/videos
Documentation pageDomestic Video Model Overview
Common modest2v text-to-video, i2v image-to-video
ResolutionThe model name already includes a fixed resolution tier
Reference imagei2v is commonly a single-image input

Vidu

ItemDescription
Model examplesVidu-q3-pro, Vidu-q3-turbo
Recommended entry pointPOST /v1/videos
Documentation pageDomestic Video Model Overview
Request styleJSON
First-frame imageimage
First and last framesimage + metadata.last_frame_url
Reference-to-videoimages, commonly up to 3 images

Kling

ItemDescription
Model examplesKling-3.0, Kling-3.0-Omni
Recommended entry pointPOST /v1/videos or the official compatible route /kling/v1/videos/*
Documentation pageDomestic Video Model Overview
Request styleJSON
Common fieldsprompt, seconds, metadata.output_config
Reference imageimage
AudioCan be controlled via metadata.output_config.audio_generation

Jimeng Video

ItemDescription
Model examplesjimeng-video-3.0, jimeng-video-2.0
Recommended entry pointPOST /v1/videos (OpenAI format), POST /v1/video/create (unified video)
Documentation pageDomestic Video Model Overview, Jimeng Video Overview
Request styleJSON / multipart/form-data
Common fieldsmodel, prompt, seconds, size, input_reference (OpenAI format); images, aspect_ratio, size (unified video)
Reference imageOpenAI format: input_reference file upload; unified video: images array
Integration modesOpenAI format, unified video, Doubao channel
Typical scenariosText-to-video, image-to-video, first and last frames to video

Hailuo

ItemDescription
Model examplesHailuo-2.3, Hailuo-2.3-fast
Recommended entry pointPOST /v1/videos
Documentation pageDomestic Video Model Overview
Request styleJSON
Common fieldsprompt, seconds, metadata.output_config.resolution
Reference imageimage
NotesDo not rely on aspect_ratio; it is currently more suitable for text-to-video and first-frame-to-video

Supported Generation Modes

Model FamilyText-to-videoFirst-frame-to-videoFirst and last framesReference-to-videoAudio toggle
SoraSupportedSupportedSome scenarios depend on upstreamSome scenarios are implemented through multi-image referenceSupported
VeoSupportedCan be implemented via reference imageSupportedSupportedDepends on upstream
Grok VideoSupportedSupportedSupportedSupportedDepends on upstream
Doubao SeedanceSupportedSupportedSupportedNot recommendedDepends on upstream
Alibaba wan2.6Supportedi2v supportedDepends on upstreamDepends on upstreamDepends on upstream
Jimeng VideoSupportedSupportedSupportedSupportedDepends on upstream
ViduSupportedSupportedSupportedSupportedDepends on upstream
KlingSupportedSupportedNot currently recommended as a standard capability commitmentNot recommendedSupported
HailuoSupportedSupportedNot recommendedNot recommendedDepends on upstream
  1. Grok Video Overview
  2. Sora Video Overview
  3. Veo Video Overview
  4. If you want to implement image-to-video, first confirm whether the target model expects image, images, input_reference, or first_frame_image / last_frame_image.
  5. If you are explicitly integrating in Kling’s official format, then refer to this set of routes: /kling/v1/videos/*.