POST
https://octopusx.ai
/
v1beta
/
models
/
{model}
:generateContent
Gemini Image Generation API
curl --request POST \
  --url https://octopusx.ai/v1beta/models/{model}:generateContent \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "contents": [
    {}
  ],
  "contents[].role": "<string>",
  "contents[].parts": [
    {}
  ],
  "contents[].parts[].text": "<string>",
  "contents[].parts[].inlineData": {},
  "contents[].parts[].inlineData.mimeType": "<string>",
  "contents[].parts[].inlineData.data": "<string>",
  "generationConfig": {},
  "generationConfig.responseModalities": [
    "<string>"
  ],
  "generationConfig.imageConfig": {},
  "generationConfig.imageConfig.aspectRatio": "<string>",
  "generationConfig.imageConfig.imageSize": "<string>",
  "safetySettings": [
    {}
  ]
}
'
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "inlineData": {
              "mimeType": "image/png",
              "data": "BASE64_OR_URL"
            }
          }
        ]
      },
      "finishReason": "STOP"
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 123,
    "candidatesTokenCount": 456,
    "totalTokenCount": 579,
    "trafficType": "ON_DEMAND"
  },
  "modelVersion": "gemini-3-pro-image-preview",
  "createTime": "2026-05-21T00:00:00Z"
}

Gemini Image Generation API

Gemini image models use the official generateContent format for image generation, which is suitable for directly reusing the contents / parts structure.
  • The path is POST /v1beta/models/{model}:generateContent.
  • Both text-to-image and image-to-image use the same contents[].parts[] structure.
  • Reference images are passed in through parts[].inlineData.
  • In the plugin, responseModalities requests both IMAGE and TEXT by default.
  • The response may either return inlineData.data as Base64 directly, or return an image URL.

Supported models

  • gemini-3-pro-image-preview
  • gemini-2.5-flash-image-preview
  • gemini-3.1-flash-image-preview

Model differences

ModelimageSize behaviorDescription
gemini-3-pro-image-previewSupports 1K and 2KDelivers according to the requested imageSize
gemini-2.5-flash-image-previewActually falls back to 1KEven if the UI allows selecting 2K, the plugin will ultimately send 1K
gemini-3.1-flash-image-previewActually falls back to 1KEven if the UI allows selecting 2K, the plugin will ultimately send 1K

First understand image-to-image

Gemini does not provide a separate “edit API” here. Image-to-image means directly placing the reference image into contents[].parts[] as input. The simplest way to understand it is:
  • Text-to-image: parts[] contains only text
  • Image-to-image: parts[] contains both text and images
In other words, the prompt and reference images are sent to the model together.

Text-to-image vs image-to-image

Scenarioparts[] contentDescription
Text-to-imageOnly textPure prompt-based generation
Image-to-imagetext + one or more inlineDataLet the model refer to an existing image’s style, subject, or composition

How to pass image-to-image

You only need to remember one rule:
  • The first type of content is the prompt: { "text": "..." }
  • The second type of content is the reference image: { "inlineData": { "mimeType": "...", "data": "BASE64..." } }
This is also how the plugin works in practice:
  • Put the prompt first
  • Then convert each reference image to Base64
  • Append them one by one to the same parts[]

Minimal image-to-image example

{
  "contents": [
    {
      "role": "user",
      "parts": [
        { "text": "Keep the subject and atmosphere of the reference image, and remake it into a more premium poster" },
        {
          "inlineData": {
            "mimeType": "image/jpeg",
            "data": "BASE64_IMAGE"
          }
        }
      ]
    }
  ],
  "generationConfig": {
    "responseModalities": ["TEXT", "IMAGE"],
    "imageConfig": {
      "aspectRatio": "16:9",
      "imageSize": "2K"
    }
  }
}

Method and path

POST /v1beta/models/{model}:generateContent

Request example

curl -X POST https://octopusx.ai/v1beta/models/gemini-3-pro-image-preview:generateContent \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [
          { "text": "A futuristic AI workstation scene, cool-toned atmosphere, cinematic lighting" }
        ]
      }
    ],
    "generationConfig": {
      "responseModalities": ["TEXT", "IMAGE"],
      "temperature": 1.0,
      "topP": 0.95,
      "maxOutputTokens": 8192,
      "imageConfig": {
        "aspectRatio": "16:9",
        "imageSize": "2K"
      }
    }
  }'

Image-to-image example

{
  "contents": [
    {
      "role": "user",
      "parts": [
        { "text": "Blend the style of the reference images and output a high-resolution landscape poster" },
        {
          "inlineData": {
            "mimeType": "image/jpeg",
            "data": "BASE64_IMAGE_1"
          }
        },
        {
          "inlineData": {
            "mimeType": "image/jpeg",
            "data": "BASE64_IMAGE_2"
          }
        }
      ]
    }
  ],
  "generationConfig": {
    "responseModalities": ["TEXT", "IMAGE"],
    "imageConfig": {
      "aspectRatio": "16:9",
      "imageSize": "2K"
    }
  }
}

Image-to-image field explanations

FieldRole in image-to-image
parts[].textTells the model how to modify, what to keep, and what style to output
parts[].inlineData.mimeTypeDeclares the reference image format
parts[].inlineData.dataBase64 content of the reference image
imageConfig.aspectRatioConstrains the final image ratio
imageConfig.imageSizeConstrains the final image resolution

How many reference images can be passed

From your existing main.py implementation, the plugin iterates over reference_images and appends each image to parts[], so the documentation can be understood as:
  • Supports 1 reference image
  • Also supports multiple reference images
  • When there are multiple images, they are appended consecutively as multiple inlineData entries

How to inspect the result after the response

The image returned by Gemini image generation is usually located at:
  • candidates[0].content.parts[].inlineData.data
This field may be one of two types:
  1. Base64 image data directly
  2. An image URL directly
Your plugin already supports both response formats, so BASE64_OR_URL is written intentionally in the documentation.

Body

contents
array<object>
required
Input content list. Each item usually contains role and parts.
contents[].role
string
Role field. The plugin always uses user.
contents[].parts
array<object>
required
Array of content fragments. Both the prompt and reference images are built here.
contents[].parts[].text
string
Text prompt.
contents[].parts[].inlineData
object
Reference image input. Contains two fields: mimeType and data.
contents[].parts[].inlineData.mimeType
string
Image MIME type. Common values are image/jpeg, image/png, and image/webp.
contents[].parts[].inlineData.data
string
Base64-encoded image content.
generationConfig
object
required
Generation configuration object.
generationConfig.responseModalities
array<string>
Return modality list. The plugin uses ["IMAGE", "TEXT"] by default.
generationConfig.imageConfig
object
required
Image configuration object.
generationConfig.imageConfig.aspectRatio
string
Image ratio. The plugin supports 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, and 21:9.
generationConfig.imageConfig.imageSize
string
Image resolution. gemini-3-pro-image-preview supports the 1K and 2K values exposed in the plugin; gemini-2.5-flash-image-preview and gemini-3.1-flash-image-preview will actually fall back to 1K.
safetySettings
array<object>
Safety settings. Optional.

Response example

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "inlineData": {
              "mimeType": "image/png",
              "data": "BASE64_OR_URL"
            }
          }
        ]
      },
      "finishReason": "STOP"
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 123,
    "candidatesTokenCount": 456,
    "totalTokenCount": 579,
    "trafficType": "ON_DEMAND"
  },
  "modelVersion": "gemini-3-pro-image-preview",
  "createTime": "2026-05-21T00:00:00Z"
}

Response

candidates[].content.parts[].inlineData.mimeType
string
Returned image MIME type.
candidates[].content.parts[].inlineData.data
string
Image data. May be Base64, or may directly be an image URL.
usageMetadata
object
Token usage statistics.