Gemini Image Generation API
Gemini Image
Gemini Image Generation API
Use POST /v1beta/models/{model}:generateContent to call the Gemini image generation API, covering gemini-3-pro-image-preview, gemini-2.5-flash-image-preview, and gemini-3.1-flash-image-preview in a unified way.
POST
Gemini Image Generation API
Gemini Image Generation API
Gemini image models use the officialgenerateContent format for image generation, which is suitable for directly reusing the contents / parts structure.
- The path is
POST /v1beta/models/{model}:generateContent. - Both text-to-image and image-to-image use the same
contents[].parts[]structure. - Reference images are passed in through
parts[].inlineData. - In the plugin,
responseModalitiesrequests bothIMAGEandTEXTby default. - The response may either return
inlineData.dataas Base64 directly, or return an image URL.
Supported models
gemini-3-pro-image-previewgemini-2.5-flash-image-previewgemini-3.1-flash-image-preview
Model differences
| Model | imageSize behavior | Description |
|---|---|---|
gemini-3-pro-image-preview | Supports 1K and 2K | Delivers according to the requested imageSize |
gemini-2.5-flash-image-preview | Actually falls back to 1K | Even if the UI allows selecting 2K, the plugin will ultimately send 1K |
gemini-3.1-flash-image-preview | Actually falls back to 1K | Even if the UI allows selecting 2K, the plugin will ultimately send 1K |
First understand image-to-image
Gemini does not provide a separate “edit API” here. Image-to-image means directly placing the reference image intocontents[].parts[] as input.
The simplest way to understand it is:
- Text-to-image:
parts[]contains only text - Image-to-image:
parts[]contains both text and images
Text-to-image vs image-to-image
| Scenario | parts[] content | Description |
|---|---|---|
| Text-to-image | Only text | Pure prompt-based generation |
| Image-to-image | text + one or more inlineData | Let the model refer to an existing image’s style, subject, or composition |
How to pass image-to-image
You only need to remember one rule:- The first type of content is the prompt:
{ "text": "..." } - The second type of content is the reference image:
{ "inlineData": { "mimeType": "...", "data": "BASE64..." } }
- Put the
promptfirst - Then convert each reference image to Base64
- Append them one by one to the same
parts[]
Minimal image-to-image example
Method and path
Request example
Image-to-image example
Image-to-image field explanations
| Field | Role in image-to-image |
|---|---|
parts[].text | Tells the model how to modify, what to keep, and what style to output |
parts[].inlineData.mimeType | Declares the reference image format |
parts[].inlineData.data | Base64 content of the reference image |
imageConfig.aspectRatio | Constrains the final image ratio |
imageConfig.imageSize | Constrains the final image resolution |
How many reference images can be passed
From your existingmain.py implementation, the plugin iterates over reference_images and appends each image to parts[], so the documentation can be understood as:
- Supports 1 reference image
- Also supports multiple reference images
- When there are multiple images, they are appended consecutively as multiple
inlineDataentries
How to inspect the result after the response
The image returned by Gemini image generation is usually located at:candidates[0].content.parts[].inlineData.data
- Base64 image data directly
- An image URL directly
BASE64_OR_URL is written intentionally in the documentation.
Body
Input content list. Each item usually contains
role and parts.Role field. The plugin always uses
user.Array of content fragments. Both the prompt and reference images are built here.
Text prompt.
Reference image input. Contains two fields:
mimeType and data.Image MIME type. Common values are
image/jpeg, image/png, and image/webp.Base64-encoded image content.
Generation configuration object.
Return modality list. The plugin uses
["IMAGE", "TEXT"] by default.Image configuration object.
Image ratio. The plugin supports
1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, and 21:9.Image resolution.
gemini-3-pro-image-preview supports the 1K and 2K values exposed in the plugin; gemini-2.5-flash-image-preview and gemini-3.1-flash-image-preview will actually fall back to 1K.Safety settings. Optional.
Response example
Response
Returned image MIME type.
Image data. May be Base64, or may directly be an image URL.
Token usage statistics.