Skip to content

Combining Multiple Images

Several Prodia models accept more than one input image in a single job. This is the model to reach for when you need to combine a subject from one photo with a setting from another, swap an element across images, or carry style and identity from a reference into a new scene — all without writing custom compositing code.

This guide walks through the multipart shape used to send multiple inputs and shows it end-to-end with Nano Banana and FLUX.2 [flex]. The same pattern works with every job type listed under Models that support multiple inputs below.

We’ll combine these two inputs — a product shot of a ceramic mug and an empty kitchen scene:

multi-input-product.jpg

product.jpg — a white ceramic mug on a neutral grey background

multi-input-scene.jpg

scene.jpg — an empty wooden kitchen table in warm morning light

Terminal window
# Create a project directory.
mkdir prodia-combining-images
cd prodia-combining-images

Install Node (if not already installed):

Terminal window
brew install node
# Close the current terminal and open a new one so that node is available.

Create project skeleton:

Terminal window
# Requires node --version >= 18
# Initialize the project with npm.
npm init -y
# Install the prodia-js library.
npm install prodia --save
Terminal window
# Export your token so it can be used by the main code.
export PRODIA_TOKEN=your-token-here

Your token is exported to an environment variable. If you close or switch your shell you’ll need to run export PRODIA_TOKEN=your-token-here again.

Create a main file for your project:

main.js
const { createProdia } = require("prodia/v2");
const prodia = createProdia({
token: process.env.PRODIA_TOKEN // get it from environment
});

You’re now ready to make some API calls!

A multi-image job has two parts:

  1. The config.images array lists the filenames of the inputs in the order your prompt refers to them — for example ["product.jpg", "scene.jpg"].
  2. Each filename must be sent as a separate input part in the multipart POST /v2/job request, with the same name the config refers to.

The server matches the images filenames to the input parts. Send too few parts, or use a different filename than the config references, and you’ll get a 400 Bad Request such as filename 'product.jpg' not found in request.

inference.nano-banana.img2img.v2 accepts up to 3 input images for $0.039 per job, regardless of resolution.

The JS SDK uses File objects to preserve the filename — the config’s images array must match these names exactly.

main.js
const { createProdia } = require("prodia/v2");
const fs = require("node:fs/promises");
const prodia = createProdia({
token: process.env.PRODIA_TOKEN,
});
(async () => {
// download the two reference images on first run
for (const name of ["product.jpg", "scene.jpg"]) {
try {
await fs.access(name);
} catch {
const res = await fetch(`https://docs.prodia.com/multi-input-${name}`);
await fs.writeFile(name, new Uint8Array(await res.arrayBuffer()));
}
}
const product = new File(
[await fs.readFile("product.jpg")],
"product.jpg",
{ type: "image/jpeg" },
);
const scene = new File(
[await fs.readFile("scene.jpg")],
"scene.jpg",
{ type: "image/jpeg" },
);
const job = await prodia.job({
type: "inference.nano-banana.img2img.v2",
config: {
prompt: "Place the white ceramic mug from the first image onto the wooden table in the second image. Match the warm morning lighting and the shallow depth of field of the kitchen scene. Keep the mug's matte finish and proportions exactly the same.",
images: ["product.jpg", "scene.jpg"],
aspect_ratio: "1:1",
},
}, {
inputs: [product, scene],
});
const composed = await job.arrayBuffer();
await fs.writeFile("composed.jpg", new Uint8Array(composed));
})();
Terminal window
node main.js
Terminal window
open composed.jpg

The mug is placed on the wooden table with the warm window light wrapping around it, and the depth of field from the kitchen scene is preserved:

multi-input-output-nano-banana.jpg

The same shape works with inference.flux-2.flex.img2img.v1, which accepts up to 10 input images and exposes width, height, steps, and guidance knobs. Only two things change from the Nano Banana request: the type and the FLUX-specific config fields.

main.js
const job = await prodia.job({
type: "inference.nano-banana.img2img.v2",
type: "inference.flux-2.flex.img2img.v1",
config: {
prompt: "Place the white ceramic mug from the first image onto the wooden kitchen table in the second image. Match the warm morning lighting, scale the mug realistically for a kitchen table, and preserve the matte finish. Photorealistic.",
images: ["product.jpg", "scene.jpg"],
aspect_ratio: "1:1",
width: 1024,
height: 1024,
steps: 50,
},
}, {
inputs: [product, scene],
});

FLUX.2 [flex] returns a similar composite — the diffusion path adds slightly more variance to the mug’s silhouette but resolves the lighting on the wood with sharper highlights:

multi-input-output-flux-2.jpg

Job typeMax inputsNotes
inference.nano-banana.img2img.v23Flat-rate, ~8s, natural-language editing
inference.gemini-3-pro.img2img.v13Up to 4K resolution, ~12s
inference.gemini-3-1-flash.img2img.v114Cheaper Gemini variant, optional Google Search grounding
inference.flux-2.dev.img2img.v18Open-weight variant with style presets
inference.flux-2.pro.img2img.v18Up to 4096px, 9MP combined input limit
inference.flux-2.flex.img2img.v110Highest input count in the FLUX.2 family
inference.flux-2.max.img2img.v18Highest single-image quality at up to 2048px
inference.seedream-5-0.lite.img2img.v114Multi-image blending

Single-input editing models — FLUX.1 Kontext, SDXL inpainting, Recraft V4, and the SeedEdit/Seedance img2img endpoints — accept only one input part. Sending more than one will be rejected at validation.

  • Anchor each input by position. Models read the images array in order. Phrase your prompt as “the <subject> from the first image, on the <background> in the second image” rather than naming files
  • Describe the relationship, not each image. The model already sees both — what it needs from you is what to do with them (“place onto”, “match the lighting of”, “blend the styles of”)
  • Be explicit about what to preserve. Phrases like “keep the matte finish exactly the same” reduce drift on the subject you care about
  • Match aspect ratios deliberately. Nano Banana defaults to auto (the first input’s aspect ratio); FLUX.2 takes explicit width and height. Choose the framing the scene image was shot for — your subject will be re-composed into it
  • filename 'X' not found in request — the filename in config.images does not match any input part. With the JS SDK, Uint8Array and Blob inputs are sent as image.jpg regardless of the variable name; use a File object with the desired name (as shown above) when the config references specific filenames
  • config: too many images — exceeded the per-model input limit (see the table above)
  • 413 Payload Too Large — total upload exceeded the per-model size limit (FLUX.2 Pro caps the combined inputs at 9MP, for example). Resize inputs before sending