Skip to content

Gemini 3

Gemini 3 is Google’s higher-tier image family — the same multimodal lineage as Nano Banana, but with selectable resolution up to 4K and stronger fidelity on complex prompts. Two variants are exposed through Prodia: Gemini 3 Pro for highest-quality output and Gemini 3.1 Flash for cost-efficient generation with optional Google Search grounding. Both accept text prompts, support image-to-image editing, and complete in 10–35 seconds depending on resolution.

Gemini 3 is a multimodal model: text and images share a single context window, and image generation runs through the same reasoning path as text generation. In practice this is what makes editing behave like instruction-following rather than diffusion noise blending — the model reads the input image as visual tokens, reasons about which regions the prompt describes, and rewrites only those pixels.

The Pro variant is the higher-capability tier, biased toward complex compositions, accurate text rendering, and detail retention at high resolutions. Flash is the smaller, faster sibling — same model family, lower cost per image, and uniquely able to ground generation in real-time Google Search results when google_search is enabled.

VariantBest forGeneration timeResolutionsPrice
ProHigh-fidelity output, hardest prompts, 4K detail~10–12s1K, 2K, 4K$0.15 (1K/2K), $0.30 (4K)
FlashCost-efficient generation, search grounding, batch workloads~30–35s1K, 2K, 4K$0.08 (1K), $0.12 (2K), $0.16 (4K)

Gemini 3 Pro — flagship image model. Accurate text rendering, photorealistic detail, and consistent multi-object composition. Img2img accepts up to 3 reference images.

Gemini 3.1 Flash — smaller, cheaper, and supports google_search grounding for prompts that require real-world facts (logos, product designs, current events). Img2img accepts up to 14 reference images, useful for multi-image composition or character-consistency workflows across larger image sets.

  • 4K-resolution output — neither Nano Banana nor FLUX.2 generates above 2K natively. Use Gemini 3 when you need print-quality detail
  • Accurate text rendering at high resolution — both variants render text reliably; for native vector text output, see Recraft V4
  • Multi-image composition with up to 14 inputs (Flash) — combining a subject across many reference frames, or fusing several scenes into one
  • Search-grounded generation (Flash only) — set google_search: true to pull in real-world references during generation. Useful for logos, products, locations, or anything that benefits from current factual context
  • Predictable resolution-tier pricing — the only billing parameter is resolution. No per-step or per-pixel cost surprises

For natural-language editing at flat per-job pricing, Nano Banana is a strong alternative. For instruction-guided edits with arbitrary aspect ratios, see FLUX.1 Kontext.

Job typeDescriptionETA
inference.gemini-3-pro.txt2img.v1Gemini 3 Pro text-to-image~10s
inference.gemini-3-pro.img2img.v1Gemini 3 Pro image-to-image (up to 3 inputs)~12s
inference.gemini-3-1-flash.txt2img.v1Gemini 3.1 Flash text-to-image, optional Google Search grounding~30s
inference.gemini-3-1-flash.img2img.v1Gemini 3.1 Flash image-to-image (up to 14 inputs)~35s

Common to all four job types:

  • prompt (required) — text description of the desired output, 1–5,000 characters. For img2img this is the editing instruction
  • aspect_ratio — one of 1:1 (default), 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
  • resolution1K (default), 2K, or 4K. Use uppercase K. Pricing tier is determined by this field
  • include_messages — boolean, default false. When true, the multipart response also returns message.txt parts containing the model’s natural-language reasoning

gemini-3-pro.img2img.v1:

  • images — optional array of 1–3 input image filenames sent as multipart input parts. When omitted, a single attached input is used implicitly

gemini-3-1-flash.txt2img.v1 and img2img.v1:

  • google_search — boolean, default false. Enables Google Search grounding so the model can incorporate real-world references (logos, products, locations) into the output
  • imagesimg2img only, optional array of 1–14 input image filenames
  • Describe the change, not the whole scene. For img2img write the diff — “add a small wooden tag with a leather string” — rather than re-describing the input. The model already sees it
  • Anchor preservation explicitly. “Keep everything else exactly the same” reduces drift in untouched regions, especially for subtle edits at 1K
  • Use 4K only when you need it. 4K Pro is 2x the price of 1K/2K, and Flash 4K takes 30s+. For thumbnails and web use, 1K is usually enough
  • Enable google_search for fact-bound prompts — brand logos, product replicas, real-world architecture. Skip it for purely creative prompts where grounding adds latency without value
  • Pass multiple images deliberately. When using multi-image inputs, refer to them by position in the prompt — “the subject from the first image, in the setting from the second” — and make sure the multipart filenames match the order in images
  • Pick aspect_ratio up front. The default is 1:1. For social, web, or phone, set 9:16 or 16:9 rather than upscaling or cropping after the fact

Gemini 3 Pro text-to-image at 16:9, 1K resolution:

{
"type": "inference.gemini-3-pro.txt2img.v1",
"config": {
"prompt": "a cute red panda eating bamboo on a mossy log, soft natural light, photorealistic",
"aspect_ratio": "16:9",
"resolution": "1K"
}
}

Gemini 3 Pro txt2img — red panda eating bamboo on a mossy log

Same image edited with img2img.v1 — adds a wooden name tag while preserving the pose, fur, lighting, and background:

{
"type": "inference.gemini-3-pro.img2img.v1",
"config": {
"prompt": "Add a small wooden tag with a leather string around the red panda's neck. Keep everything else exactly the same.",
"aspect_ratio": "16:9",
"resolution": "1K"
}
}

Gemini 3 Pro img2img — same panda with a wooden name tag added

Gemini 3.1 Flash text-to-image at 2K, generating a multi-panel infographic with rendered text:

{
"type": "inference.gemini-3-1-flash.txt2img.v1",
"config": {
"prompt": "a hyper-realistic infographic poster about coffee brewing methods, clean modern design, soft pastel background",
"aspect_ratio": "3:4",
"resolution": "2K"
}
}

Gemini 3.1 Flash txt2img — coffee brewing methods infographic

Gemini 3.1 Flash img2img — same red panda transformed into a winter scene:

{
"type": "inference.gemini-3-1-flash.img2img.v1",
"config": {
"prompt": "Make this look like a winter scene with snow falling and snow covering the moss and ferns. Keep the red panda in the same pose.",
"aspect_ratio": "16:9",
"resolution": "1K"
}
}

Gemini 3.1 Flash img2img — same panda in a snowy winter scene

Search-grounded generation with Flash — pull a real-world reference into the output:

{
"type": "inference.gemini-3-1-flash.txt2img.v1",
"config": {
"prompt": "a product render of the original 1984 Macintosh 128K on a clean studio backdrop, accurate proportions, soft top lighting",
"aspect_ratio": "1:1",
"resolution": "2K",
"google_search": true
}
}