Domestic Video Models

Domestic video models provide a unified OpenAI-compatible interface, supporting multiple generation modes such as text-to-video, image-to-video, first-and-last-frame video generation, motion control, digital humans, and lip sync.

Quick Start

Video Generation

POST /v1/videos, submit domestic video tasks, supporting multiple generation scenarios.

Task Query

GET /v1/videos/{task_id}, query task status and results.

Model Family

Model SeriesRepresentative ModelKey Capabilities
Jimengjimeng-video-3.0Text-to-video, image-to-video, first-and-last-frame video generation
Viduviduq3-proDirect audio-video output, background music, off-peak mode
KlingKling-3.0-OmniMotion control, digital humans, lip sync, template effects

Full Model List

Base Models:
  • Vidu-*, Kling-*, GV-*, OS-*
  • Hunyuan-*, Mingmou-*, Hailuo-*
  • SV-*, JV-*, jimeng-video-*
Combined Billing Models:
  • vidu-q2-pro-reference-1080p-offpeak
  • kling-3.0-omni-1080p-ref-audio
  • kling-2.6-motion-pro-1080p
  • kling-avatar-720p
  • sv-1.5-pro-1080p-audio

Integration Methods

Domestic video models use the following interfaces in a unified way:
OperationEndpointDescription
Create TaskPOST /v1/videosSubmit a video generation task
Query StatusGET /v1/videos/{task_id}Query task progress and status
Get ContentGET /v1/videos/{task_id}/contentGet the video download URL
Note: Different model series may provide multiple integration methods.
  • Vidu: Official format (/vidu/ent/v2/*), OpenAI format (/v1/videos), unified video (/v1/video/create)
  • Jimeng: OpenAI format (/v1/videos), unified video (/v1/video/create), Doubao channel
  • Kling: OpenAI format (/v1/videos)
This page mainly introduces the general specifications of the OpenAI-compatible interface. For model-specific interfaces, please refer to the model documentation below.

Request Parameters

Image Input Description

Different integration methods use different image fields:
Integration MethodField NameFormatDescription
OpenAI format (multipart)input_referencefileFile upload
OpenAI format (JSON)imagestringImage URL
Unified videoimagesarrayArray of image URLs

Basic Fields

FieldTypeRequiredDescription
modelstringModel name, supports base models or combined billing models
promptstringPrompt, text description for video generation
secondsstringVideo duration, for example 5, 10, 15
sizestringQuick size input, for example 720x1280
imagestringReference image URL (image-to-video scenario)
imagesarrayReference image array (first-and-last-frame scenario)
input_referencefileReference file (OpenAI format upload)
metadataobjectExtended parameters; it is recommended to pass through upstream native configuration

metadata Extended Parameters

Scenario Type:
ScenarioValueDescription
Motion Controlmotion_controlPrecisely control video motion
Digital Human Generationavatar_i2vGenerate digital human videos
Lip Synclip_syncAudio-visual synchronized lip sync
Template Effectstemplate_effectApply template effects
Output Configuration (metadata.output_config):
FieldTypeDescription
resolutionstringResolution: 720P, 1080P
aspect_ratiostringAspect ratio: 16:9, 9:16, 1:1
durationintegerDuration (seconds)
audio_generationstringAudio generation: Enabled, Disabled
Other Common Fields:
  • motion_level: motion level (std/pro)
  • offpeak: whether off-peak billing is enabled
  • last_frame_url: the last frame in first-and-last-frame generation
  • video_url: reference video URL
  • file_infos: native FileInfos passthrough
  • ext_info: native ExtInfo string passthrough

Parameter Precedence

Duration Parameters

  1. Top-level seconds
  2. Top-level duration
  3. metadata.seconds / metadata.duration / metadata.video_duration
  4. Default 5

Resolution Parameters

  1. metadata.output_config.resolution
  2. Top-level size
  3. Model default value

Scenario Examples

Text-to-Video

{
  "model": "Kling-3.0-Omni",
  "prompt": "Cyberpunk city night scene, the camera slowly pushes in",
  "seconds": "5",
  "metadata": {
    "output_config": {
      "resolution": "720P",
      "aspect_ratio": "16:9"
    }
  }
}

Image-to-Video

{
  "model": "viduq3-pro",
  "prompt": "Make the character walk forward and smile",
  "image": "https://example.com/character.png",
  "seconds": "5",
  "metadata": {
    "output_config": {
      "resolution": "1080P",
      "aspect_ratio": "9:16"
    }
  }
}

First-and-Last-Frame Video Generation

{
  "model": "Kling-3.0-Omni",
  "prompt": "In a dead-quiet system space, the character is illuminated by a blue panel",
  "seconds": "15",
  "size": "720x1280",
  "metadata": {
    "output_config": {
      "duration": 15,
      "resolution": "720P",
      "aspect_ratio": "9:16",
      "audio_generation": "Enabled"
    },
    "last_frame_url": "https://example.com/last-frame.png"
  }
}

Motion Control

{
  "model": "Kling-3.0-Omni",
  "prompt": "The character waves hello",
  "image": "https://example.com/character.png",
  "seconds": "5",
  "metadata": {
    "scene_type": "motion_control",
    "motion_level": "pro",
    "output_config": {
      "resolution": "1080P",
      "aspect_ratio": "16:9"
    }
  }
}

Model Documentation

Detailed documentation for the following mainstream models is currently provided first:

Jimeng Video

Documentation for generating and querying jimeng-video-3.0 and jimeng-video-2.0.

Vidu Video

The three integration methods for the viduq3-pro, viduq2, and viduq1 series.

Kling Video

Multiple generation modes for Kling-3.0-Omni, Kling-2.6, and Kling-2.5.
Detailed documentation for other models (GV-*, OS-*, Hunyuan-*, Mingmou-*, Hailuo-*, SV-*, JV-*) is being organized. You can refer to Domestic Video Model Generation for general parameter specifications and usage methods.

Authentication

Channel key format:
  • SubAppId|SecretId|SecretKey
  • SubAppId|SecretId|SecretKey|Region