Gemini Image Generation API

curl --request POST \
  --url https://octopusx.ai/v1beta/models/{model}:generateContent \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "contents": [
    {}
  ],
  "contents[].role": "<string>",
  "contents[].parts": [
    {}
  ],
  "contents[].parts[].text": "<string>",
  "contents[].parts[].inlineData": {},
  "contents[].parts[].inlineData.mimeType": "<string>",
  "contents[].parts[].inlineData.data": "<string>",
  "generationConfig": {},
  "generationConfig.responseModalities": [
    "<string>"
  ],
  "generationConfig.imageConfig": {},
  "generationConfig.imageConfig.aspectRatio": "<string>",
  "generationConfig.imageConfig.imageSize": "<string>",
  "safetySettings": [
    {}
  ]
}
'

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "inlineData": {
              "mimeType": "image/png",
              "data": "BASE64_OR_URL"
            }
          }
        ]
      },
      "finishReason": "STOP"
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 123,
    "candidatesTokenCount": 456,
    "totalTokenCount": 579,
    "trafficType": "ON_DEMAND"
  },
  "modelVersion": "gemini-3-pro-image-preview",
  "createTime": "2026-05-21T00:00:00Z"
}

POST

https://octopusx.ai

v1beta

models

{model}

:generateContent

Gemini Image Generation API

curl --request POST \
  --url https://octopusx.ai/v1beta/models/{model}:generateContent \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "contents": [
    {}
  ],
  "contents[].role": "<string>",
  "contents[].parts": [
    {}
  ],
  "contents[].parts[].text": "<string>",
  "contents[].parts[].inlineData": {},
  "contents[].parts[].inlineData.mimeType": "<string>",
  "contents[].parts[].inlineData.data": "<string>",
  "generationConfig": {},
  "generationConfig.responseModalities": [
    "<string>"
  ],
  "generationConfig.imageConfig": {},
  "generationConfig.imageConfig.aspectRatio": "<string>",
  "generationConfig.imageConfig.imageSize": "<string>",
  "safetySettings": [
    {}
  ]
}
'

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "inlineData": {
              "mimeType": "image/png",
              "data": "BASE64_OR_URL"
            }
          }
        ]
      },
      "finishReason": "STOP"
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 123,
    "candidatesTokenCount": 456,
    "totalTokenCount": 579,
    "trafficType": "ON_DEMAND"
  },
  "modelVersion": "gemini-3-pro-image-preview",
  "createTime": "2026-05-21T00:00:00Z"
}

Gemini Image Generation API

Gemini image models use the official generateContent format for image generation, which is suitable for directly reusing the contents / parts structure.

The path is POST /v1beta/models/{model}:generateContent.
Both text-to-image and image-to-image use the same contents[].parts[] structure.
Reference images are passed in through parts[].inlineData.
In the plugin, responseModalities requests both IMAGE and TEXT by default.
The response may either return inlineData.data as Base64 directly, or return an image URL.

Supported models

gemini-3-pro-image-preview
gemini-2.5-flash-image-preview
gemini-3.1-flash-image-preview

Model differences

Model	`imageSize` behavior	Description
`gemini-3-pro-image-preview`	Supports `1K` and `2K`	Delivers according to the requested `imageSize`
`gemini-2.5-flash-image-preview`	Actually falls back to `1K`	Even if the UI allows selecting `2K`, the plugin will ultimately send `1K`
`gemini-3.1-flash-image-preview`	Actually falls back to `1K`	Even if the UI allows selecting `2K`, the plugin will ultimately send `1K`

First understand image-to-image

Gemini does not provide a separate “edit API” here. Image-to-image means directly placing the reference image into contents[].parts[] as input. The simplest way to understand it is:

Text-to-image: parts[] contains only text
Image-to-image: parts[] contains both text and images

In other words, the prompt and reference images are sent to the model together.

Text-to-image vs image-to-image

Scenario	`parts[]` content	Description
Text-to-image	Only `text`	Pure prompt-based generation
Image-to-image	`text` + one or more `inlineData`	Let the model refer to an existing image’s style, subject, or composition

How to pass image-to-image

You only need to remember one rule:

The first type of content is the prompt: { "text": "..." }
The second type of content is the reference image: { "inlineData": { "mimeType": "...", "data": "BASE64..." } }

This is also how the plugin works in practice:

Put the prompt first
Then convert each reference image to Base64
Append them one by one to the same parts[]

Minimal image-to-image example

{
  "contents": [
    {
      "role": "user",
      "parts": [
        { "text": "Keep the subject and atmosphere of the reference image, and remake it into a more premium poster" },
        {
          "inlineData": {
            "mimeType": "image/jpeg",
            "data": "BASE64_IMAGE"
          }
        }
      ]
    }
  ],
  "generationConfig": {
    "responseModalities": ["TEXT", "IMAGE"],
    "imageConfig": {
      "aspectRatio": "16:9",
      "imageSize": "2K"
    }
  }
}

Method and path

POST /v1beta/models/{model}:generateContent

Request example

curl -X POST https://octopusx.ai/v1beta/models/gemini-3-pro-image-preview:generateContent \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [
          { "text": "A futuristic AI workstation scene, cool-toned atmosphere, cinematic lighting" }
        ]
      }
    ],
    "generationConfig": {
      "responseModalities": ["TEXT", "IMAGE"],
      "temperature": 1.0,
      "topP": 0.95,
      "maxOutputTokens": 8192,
      "imageConfig": {
        "aspectRatio": "16:9",
        "imageSize": "2K"
      }
    }
  }'

Image-to-image example

{
  "contents": [
    {
      "role": "user",
      "parts": [
        { "text": "Blend the style of the reference images and output a high-resolution landscape poster" },
        {
          "inlineData": {
            "mimeType": "image/jpeg",
            "data": "BASE64_IMAGE_1"
          }
        },
        {
          "inlineData": {
            "mimeType": "image/jpeg",
            "data": "BASE64_IMAGE_2"
          }
        }
      ]
    }
  ],
  "generationConfig": {
    "responseModalities": ["TEXT", "IMAGE"],
    "imageConfig": {
      "aspectRatio": "16:9",
      "imageSize": "2K"
    }
  }
}

Image-to-image field explanations

Field	Role in image-to-image
`parts[].text`	Tells the model how to modify, what to keep, and what style to output
`parts[].inlineData.mimeType`	Declares the reference image format
`parts[].inlineData.data`	Base64 content of the reference image
`imageConfig.aspectRatio`	Constrains the final image ratio
`imageConfig.imageSize`	Constrains the final image resolution

How many reference images can be passed

From your existing main.py implementation, the plugin iterates over reference_images and appends each image to parts[], so the documentation can be understood as:

Supports 1 reference image
Also supports multiple reference images
When there are multiple images, they are appended consecutively as multiple inlineData entries

How to inspect the result after the response

The image returned by Gemini image generation is usually located at:

candidates[0].content.parts[].inlineData.data

This field may be one of two types:

Base64 image data directly
An image URL directly

Your plugin already supports both response formats, so BASE64_OR_URL is written intentionally in the documentation.

Body

contents

array<object>

required

Input content list. Each item usually contains role and parts.

contents[].role

string

Role field. The plugin always uses user.

contents[].parts

array<object>

required

Array of content fragments. Both the prompt and reference images are built here.

contents[].parts[].text

string

Text prompt.

contents[].parts[].inlineData

object

Reference image input. Contains two fields: mimeType and data.

contents[].parts[].inlineData.mimeType

string

Image MIME type. Common values are image/jpeg, image/png, and image/webp.

contents[].parts[].inlineData.data

string

Base64-encoded image content.

generationConfig

object

required

Generation configuration object.

generationConfig.responseModalities

array<string>

Return modality list. The plugin uses ["IMAGE", "TEXT"] by default.

generationConfig.imageConfig

object

required

Image configuration object.

generationConfig.imageConfig.aspectRatio

string

Image ratio. The plugin supports 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, and 21:9.

generationConfig.imageConfig.imageSize

string

Image resolution. gemini-3-pro-image-preview supports the 1K and 2K values exposed in the plugin; gemini-2.5-flash-image-preview and gemini-3.1-flash-image-preview will actually fall back to 1K.

safetySettings

array<object>

Safety settings. Optional.

Response example

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "inlineData": {
              "mimeType": "image/png",
              "data": "BASE64_OR_URL"
            }
          }
        ]
      },
      "finishReason": "STOP"
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 123,
    "candidatesTokenCount": 456,
    "totalTokenCount": 579,
    "trafficType": "ON_DEMAND"
  },
  "modelVersion": "gemini-3-pro-image-preview",
  "createTime": "2026-05-21T00:00:00Z"
}

Response

candidates[].content.parts[].inlineData.mimeType

string

Returned image MIME type.

candidates[].content.parts[].inlineData.data

string

Image data. May be Base64, or may directly be an image URL.

usageMetadata

object

Token usage statistics.

Image Model Support Matrix gpt-image-2 Image Generation API

​Gemini Image Generation API

​Supported models

​Model differences

​First understand image-to-image

​Text-to-image vs image-to-image

​How to pass image-to-image

​Minimal image-to-image example

​Method and path

​Request example

​Image-to-image example

​Image-to-image field explanations

​How many reference images can be passed

​How to inspect the result after the response

​Body

​Response example

​Response

​Related APIs

Gemini Image Generation API

Supported models

Model differences

First understand image-to-image

Text-to-image vs image-to-image

How to pass image-to-image

Minimal image-to-image example

Method and path

Request example

Image-to-image example

Image-to-image field explanations

How many reference images can be passed

How to inspect the result after the response

Body

Response example

Response

Related APIs