Combining Multiple Images

Several Prodia models accept more than one input image in a single job. This is the model to reach for when you need to combine a subject from one photo with a setting from another, swap an element across images, or carry style and identity from a reference into a new scene — all without writing custom compositing code.

This guide walks through the multipart shape used to send multiple inputs and shows it end-to-end with Nano Banana and FLUX.2 [flex]. The same pattern works with every job type listed under Models that support multiple inputs below.

We’ll combine these two inputs — a product shot of a ceramic mug and an empty kitchen scene:

*product.jpg* — a white ceramic mug on a neutral grey background

*scene.jpg* — an empty wooden kitchen table in warm morning light

Project Setup

# Create a project directory.
mkdir prodia-combining-images
cd prodia-combining-images

Install Node (if not already installed):

brew install node
# Close the current terminal and open a new one so that node is available.

apt install node
# Close the current terminal and open a new one so that node is available.

winget install -e --id OpenJS.NodeJS.LTS
# Close the current terminal and open a new one so that node is available.

Create project skeleton:

# Requires node --version >= 18
# Initialize the project with npm.
npm init -y

# Install the prodia-js library.
npm install prodia --save

Install Python (if not already installed):

brew install python
# Close the current terminal and open a new one so that python is available.

apt install python3 python3-venv python-is-python3
# Close the current terminal and open a new one so that python is available.

winget install -e --id Python.Python.3.12
# Close the current terminal and open a new one so that python is available.

# Requires python --version >= 3.12
python -m venv venv
source venv/bin/activate
pip install requests

Install curl (if not already installed):

brew install curl
# Close the current terminal and open a new one so that curl is available.

apt install curl
# Close the current terminal and open a new one so that curl is available.

# NOTE: Windows 10 and up have curl installed by default and this can be
# skipped.
winget install -e --id cURL.cURL
# Close the current terminal and open a new one so that curl is available.

# Export your token so it can be used by the main code.
export PRODIA_TOKEN=your-token-here

Your token is exported to an environment variable. If you close or switch your shell you’ll need to run export PRODIA_TOKEN=your-token-here again.

Create a main file for your project:

const { createProdia } = require("prodia/v2");

const prodia = createProdia({
    token: process.env.PRODIA_TOKEN // get it from environment
});

Create the following main.py

from requests.adapters import HTTPAdapter, Retry
import os
import requests
import sys


prodia_token = os.getenv('PRODIA_TOKEN')
prodia_url = 'https://inference.prodia.com/v2/job'

session = requests.Session()
retries = Retry(allowed_methods=None, status_forcelist=Retry.RETRY_AFTER_STATUS_CODES)
session.mount('http://', HTTPAdapter(max_retries=retries))
session.mount('https://', HTTPAdapter(max_retries=retries))
session.headers.update({'Authorization': f"Bearer {prodia_token}"})

set -euo pipefail

You’re now ready to make some API calls!

How multi-image inputs work

A multi-image job has two parts:

The config.images array lists the filenames of the inputs in the order your prompt refers to them — for example ["product.jpg", "scene.jpg"].
Each filename must be sent as a separate input part in the multipart POST /v2/job request, with the same name the config refers to.

The server matches the images filenames to the input parts. Send too few parts, or use a different filename than the config references, and you’ll get a 400 Bad Request such as filename 'product.jpg' not found in request.

Compose with Nano Banana

inference.nano-banana.img2img.v2 accepts up to 3 input images for $0.039 per job, regardless of resolution.

The JS SDK uses File objects to preserve the filename — the config’s images array must match these names exactly.

const { createProdia } = require("prodia/v2");
const fs = require("node:fs/promises");

const prodia = createProdia({
  token: process.env.PRODIA_TOKEN,
});

(async () => {
  // download the two reference images on first run
  for (const name of ["product.jpg", "scene.jpg"]) {
    try {
      await fs.access(name);
    } catch {
      const res = await fetch(`https://docs.prodia.com/multi-input-${name}`);
      await fs.writeFile(name, new Uint8Array(await res.arrayBuffer()));
    }
  }

  const product = new File(
    [await fs.readFile("product.jpg")],
    "product.jpg",
    { type: "image/jpeg" },
  );
  const scene = new File(
    [await fs.readFile("scene.jpg")],
    "scene.jpg",
    { type: "image/jpeg" },
  );

  const job = await prodia.job({
    type: "inference.nano-banana.img2img.v2",
    config: {
      prompt: "Place the white ceramic mug from the first image onto the wooden table in the second image. Match the warm morning lighting and the shallow depth of field of the kitchen scene. Keep the mug's matte finish and proportions exactly the same.",
      images: ["product.jpg", "scene.jpg"],
      aspect_ratio: "1:1",
    },
  }, {
    inputs: [product, scene],
  });

  const composed = await job.arrayBuffer();
  await fs.writeFile("composed.jpg", new Uint8Array(composed));
})();

node main.js

Send each input as its own ('input', (filename, bytes, mime)) tuple in the files list. The filename in the tuple must match the entry in config.images.

from requests.adapters import HTTPAdapter, Retry
from io import BytesIO
import json
import os
import requests
import sys


prodia_token = os.getenv('PRODIA_TOKEN')
prodia_url = 'https://inference.prodia.com/v2/job'

session = requests.Session()
retries = Retry(allowed_methods=None, status_forcelist=Retry.RETRY_AFTER_STATUS_CODES)
session.mount('http://', HTTPAdapter(max_retries=retries))
session.mount('https://', HTTPAdapter(max_retries=retries))
session.headers.update({'Authorization': f"Bearer {prodia_token}"})

inputs = {}
for name in ('product.jpg', 'scene.jpg'):
    try:
        with open(name, 'rb') as f:
            inputs[name] = f.read()
    except FileNotFoundError:
        res = requests.get(f'https://docs.prodia.com/multi-input-{name}')
        inputs[name] = res.content
        with open(name, 'wb') as f:
            f.write(res.content)

headers = {
    'Accept': 'image/jpeg',
}

job = {
    'type': 'inference.nano-banana.img2img.v2',
    'config': {
        'prompt': "Place the white ceramic mug from the first image onto the wooden table in the second image. Match the warm morning lighting and the shallow depth of field of the kitchen scene. Keep the mug's matte finish and proportions exactly the same.",
        'images': ['product.jpg', 'scene.jpg'],
        'aspect_ratio': '1:1',
    },
}

files = [
    ('job', ('job.json', BytesIO(json.dumps(job).encode('utf-8')), 'application/json')),
    ('input', ('product.jpg', inputs['product.jpg'], 'image/jpeg')),
    ('input', ('scene.jpg', inputs['scene.jpg'], 'image/jpeg')),
]

res = session.post(prodia_url, headers=headers, files=files)
print(f"Request ID: {res.headers['x-request-id']}")
print(f"Status: {res.status_code}")

if res.status_code != 200:
    print(res.text)
    sys.exit(1)

with open('composed.jpg', 'wb') as f:
    f.write(res.content)

python main.py

Repeat -F input=@<filename> once per image. curl uses each file’s basename as the multipart filename, so the images array in job.json should reference those basenames.

set -euo pipefail

for name in product scene; do
  if [[ ! -f $name.jpg ]]; then
    curl -Lo $name.jpg "https://docs.prodia.com/multi-input-$name.jpg"
  fi
done

cat <<EOF > job.json
{
  "type": "inference.nano-banana.img2img.v2",
  "config": {
    "prompt": "Place the white ceramic mug from the first image onto the wooden table in the second image. Match the warm morning lighting and the shallow depth of field of the kitchen scene. Keep the mug's matte finish and proportions exactly the same.",
    "images": ["product.jpg", "scene.jpg"],
    "aspect_ratio": "1:1"
  }
}
EOF

curl -sSf --retry 3 \
  -H "Authorization: Bearer $PRODIA_TOKEN" \
  -H 'Accept: image/jpeg' \
  -F [email protected] \
  -F [email protected] \
  -F [email protected] \
  --output composed.jpg \
  https://inference.prodia.com/v2/job

bash main.sh

open composed.jpg

xdg-open composed.jpg

start composed.jpg

The mug is placed on the wooden table with the warm window light wrapping around it, and the depth of field from the kitchen scene is preserved:

Compose with FLUX.2 [flex]

The same shape works with inference.flux-2.flex.img2img.v1, which accepts up to 10 input images and exposes width, height, steps, and guidance knobs. Only two things change from the Nano Banana request: the type and the FLUX-specific config fields.

const job = await prodia.job({
  type: "inference.nano-banana.img2img.v2",
  type: "inference.flux-2.flex.img2img.v1",
  config: {
    prompt: "Place the white ceramic mug from the first image onto the wooden kitchen table in the second image. Match the warm morning lighting, scale the mug realistically for a kitchen table, and preserve the matte finish. Photorealistic.",
    images: ["product.jpg", "scene.jpg"],
    aspect_ratio: "1:1",
    width: 1024,
    height: 1024,
    steps: 50,
  },
}, {
  inputs: [product, scene],
});

job = {
    'type': 'inference.nano-banana.img2img.v2',
    'type': 'inference.flux-2.flex.img2img.v1',
    'config': {
        'prompt': "Place the white ceramic mug from the first image onto the wooden kitchen table in the second image. Match the warm morning lighting, scale the mug realistically for a kitchen table, and preserve the matte finish. Photorealistic.",
        'images': ['product.jpg', 'scene.jpg'],
        'aspect_ratio': '1:1',
        'width': 1024,
        'height': 1024,
        'steps': 50,
    },
}

cat <<EOF > job.json
{
  "type": "inference.nano-banana.img2img.v2",
  "type": "inference.flux-2.flex.img2img.v1",
  "config": {
    "prompt": "Place the white ceramic mug from the first image onto the wooden kitchen table in the second image. Match the warm morning lighting, scale the mug realistically for a kitchen table, and preserve the matte finish. Photorealistic.",
    "images": ["product.jpg", "scene.jpg"],
    "aspect_ratio": "1:1"
    "width": 1024,
    "height": 1024,
    "steps": 50
  }
}
EOF

FLUX.2 [flex] returns a similar composite — the diffusion path adds slightly more variance to the mug’s silhouette but resolves the lighting on the wood with sharper highlights:

Models that support multiple inputs

Job type	Max inputs	Notes
`inference.nano-banana.img2img.v2`	3	Flat-rate, ~8s, natural-language editing
`inference.gemini-3-pro.img2img.v1`	3	Up to 4K resolution, ~12s
`inference.gemini-3-1-flash.img2img.v1`	14	Cheaper Gemini variant, optional Google Search grounding
`inference.flux-2.dev.img2img.v1`	8	Open-weight variant with style presets
`inference.flux-2.pro.img2img.v1`	8	Up to 4096px, 9MP combined input limit
`inference.flux-2.flex.img2img.v1`	10	Highest input count in the FLUX.2 family
`inference.flux-2.max.img2img.v1`	8	Highest single-image quality at up to 2048px
`inference.seedream-5-0.lite.img2img.v1`	14	Multi-image blending

Single-input editing models — FLUX.1 Kontext, SDXL inpainting, Recraft V4, and the SeedEdit/Seedance img2img endpoints — accept only one input part. Sending more than one will be rejected at validation.

Prompting tips for multi-image jobs

Anchor each input by position. Models read the images array in order. Phrase your prompt as “the <subject> from the first image, on the <background> in the second image” rather than naming files
Describe the relationship, not each image. The model already sees both — what it needs from you is what to do with them (“place onto”, “match the lighting of”, “blend the styles of”)
Be explicit about what to preserve. Phrases like “keep the matte finish exactly the same” reduce drift on the subject you care about
Match aspect ratios deliberately. Nano Banana defaults to auto (the first input’s aspect ratio); FLUX.2 takes explicit width and height. Choose the framing the scene image was shot for — your subject will be re-composed into it

Common errors

filename 'X' not found in request — the filename in config.images does not match any input part. With the JS SDK, Uint8Array and Blob inputs are sent as image.jpg regardless of the variable name; use a File object with the desired name (as shown above) when the config references specific filenames
config: too many images — exceeded the per-model input limit (see the table above)
413 Payload Too Large — total upload exceeded the per-model size limit (FLUX.2 Pro caps the combined inputs at 9MP, for example). Resize inputs before sending