POST
https://octopusx.ai
/
v1
/
video
/
generations
Create Video
curl --request POST \
  --url https://octopusx.ai/v1/video/generations \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "content": [
    {}
  ],
  "content[].type": "<string>",
  "content[].text": "<string>",
  "content[].image_url": {},
  "content[].image_url.url": "<string>",
  "content[].video_url": {},
  "content[].video_url.url": "<string>",
  "content[].audio_url": {},
  "content[].audio_url.url": "<string>",
  "content[].draft_task": {},
  "content[].draft_task.id": "<string>",
  "content[].role": "<string>",
  "metadata": {},
  "metadata.duration": 123,
  "metadata.resolution": "<string>",
  "metadata.ratio": "<string>",
  "metadata.frames": 123,
  "metadata.seed": 123,
  "metadata.camera_fixed": true,
  "metadata.watermark": true,
  "metadata.generate_audio": true,
  "metadata.return_last_frame": true,
  "metadata.draft": true,
  "metadata.service_tier": "<string>",
  "metadata.execution_expires_after": 123,
  "metadata.callback_url": "<string>"
}
'
{
  "task_id": "cgt-20260412163502-x8k2m"
}

Create Video

Submit a Seedance 2.0 video generation task. It supports text-to-video, first-frame/first-and-last-frame, reference image/video/audio, video continuation, video editing, and multimodal composition modes. For assets in the media library, it is recommended to reference them in content using asset://{assetId} (see Upload Assets).

Method and Path

POST /v1/video/generations

Request Examples

{
  "model": "doubao-seedance-2-0-260128",
  "content": [
    {
      "type": "text",
      "text": "First-person view fruit tea ad: 0-2s picking apples by hand; 2-4s cut to pouring into a shaker cup and shaking; 4-6s close-up of pouring into a clear cup; 6-8s raise the cup toward the camera"
    },
    {
      "type": "image_url",
      "image_url": { "url": "https://example.com/apple.jpg" },
      "role": "reference_image"
    },
    {
      "type": "image_url",
      "image_url": { "url": "https://example.com/cup.jpg" },
      "role": "reference_image"
    },
    {
      "type": "video_url",
      "video_url": { "url": "https://example.com/pov_reference.mp4" },
      "role": "reference_video"
    },
    {
      "type": "audio_url",
      "audio_url": { "url": "https://example.com/bgm.mp3" },
      "role": "reference_audio"
    }
  ],
  "metadata": {
    "duration": 8,
    "resolution": "720p",
    "ratio": "16:9",
    "generate_audio": true,
    "watermark": false
  }
}

Response Examples

{
  "task_id": "cgt-20260412163502-x8k2m"
}
After submission, use Query Task to poll task_id.

Authentication

Authorization: Bearer YOUR_API_KEY

Body

model
string
required
Model name:
  • doubao-seedance-2-0-260128: Standard version, optimized for the best visual quality and complex shot planning
  • doubao-seedance-2-0-fast-260128: Fast version, optimized for low latency and cost-sensitive scenarios
content
array<object>
required
Multimodal input array; the order affects role assignment.
content[].type
string
required
Content type: text, image_url, video_url, audio_url, draft_task.
content[].text
string
Required when type=text; prompt text.
content[].image_url
object
Used when type=image_url; must include url.
content[].image_url.url
string
required
Public image URL or asset reference asset://{assetId}.
content[].video_url
object
Used when type=video_url; must include url.
content[].video_url.url
string
required
Public video URL or asset://{assetId}.
content[].audio_url
object
Used when type=audio_url; must include url.
content[].audio_url.url
string
required
Public audio URL or asset://{assetId}.
content[].draft_task
object
Used when type=draft_task; must include id, and it must be the only element in content.
content[].draft_task.id
string
required
Draft task ID, used to continue generation from a draft.
content[].role
string
Media role:
  • first_frame: first frame (image)
  • last_frame: last frame (image)
  • reference_image: reference image
  • reference_video: reference/source video (continuation, editing)
  • reference_audio: reference audio (requires metadata.generate_audio=true)
metadata
object
Video generation parameters; all are optional.
metadata.duration
integer
Video duration in seconds. Valid range [4, 15] or -1 (automatically determined by the model), default 5.
metadata.resolution
string
Resolution: 480p, 720p, 1080p, default 720p.
metadata.ratio
string
Aspect ratio: 16:9, 9:16, 1:1, 4:3, adaptive, default 16:9.
metadata.frames
integer
Total number of video frames. Mutually exclusive with duration; if frames is provided, it takes precedence over duration.
metadata.seed
integer
Random seed. The same seed plus the same input can produce similar results.
metadata.camera_fixed
boolean
Whether to keep the camera fixed (suppress camera movement), default false.
metadata.watermark
boolean
Whether to add a watermark in the bottom-right corner of the video, default true.
metadata.generate_audio
boolean
Whether to generate or synthesize audio. Must be true when using reference_audio, default false.
metadata.return_last_frame
boolean
Whether to return the final frame image URL for subsequent continuation, default false.
metadata.draft
boolean
Draft mode: faster generation with slightly lower quality, suitable for previews, default false.
metadata.service_tier
string
Service tier, default default.
metadata.execution_expires_after
integer
Maximum task execution time in seconds, range [3600, 259200] (1 hour to 3 days), default 172800.
metadata.callback_url
string
Callback URL when the task is completed.

content Mixing Rules

Violating the following rules may return 400:
  • reference_image cannot appear together with first_frame / last_frame
  • audio_url cannot be the only input in content; it must be paired with at least an image or video
  • draft_task must be the only element in the content array

Generation Mode Comparison

ModeRequest Example Labelcontent Key Points
Text to VideoText to Videotext + optional reference image/video/audio
First-Frame Image to VideoFirst-Frame Image to Videotext + first_frame
First-and-Last-Frame Image to VideoFirst-and-Last-Frame Image to Videotext + first_frame + last_frame
Reference Image to VideoReference Image to Videotext + reference_image
Video ContinuationVideo Continuationtext + reference_video
Video EditingVideo Editingtext + reference_video + reference_image
Multimodal CompositionMultimodal Compositiontext + multiple types of references
Reference AssetsReference AssetsEach URL uses asset://{assetId}

Response

task_id
string
Task ID, used for Query Task.