Skip to content

FLUX.1 Kontext

FLUX.1 Kontext is Black Forest Labs’ instruction-guided editing model. You provide an input image and a natural-language description of the change you want — “make the sky stormy”, “swap the dress for a leather jacket”, “add a bowler hat” — and the model rewrites only the regions the prompt describes while keeping the rest of the scene, characters, and lighting intact. It is the model to reach for when you need surgical, in-place edits without the artifacts that mask-based inpainting tends to introduce.

FLUX.1 Kontext extends the FLUX.1 rectified-flow transformer with a context-conditioning path that takes the reference image as a sequence of latent tokens alongside the text prompt. The model is trained to attend jointly to those visual tokens and the instruction, which is why it preserves identity, framing, and unedited regions much better than a vanilla img2img loop. Two characteristics of the architecture matter in practice:

  • Token-level conditioning, not noise blending — the input image is fed in as latent context rather than as a noised latent the model gradually denoises. Untouched regions stay near pixel-perfect, with no strength knob to tune
  • Single-pass editing — one prompt, one forward pass. There is no separate mask, no two-stage inpaint+blend. The model decides which pixels the instruction describes
VariantBest forGeneration timePrice
ProProduction editing at scale~6s$0.04
MaxComplex multi-step instructions, hardest edits~7s$0.08
DevStyle-preset workflows, lowest cost~7s$0.025

FLUX.1 Kontext [pro] — the default. Hosted by Black Forest Labs, served through Prodia’s /v2/job endpoint with both image generation (txt2img) and image editing (img2img) modes. Supports the full set of aspect_ratio values and a safety_tolerance knob from 0 (strict) to 6 (permissive).

FLUX.1 Kontext [max] — same modes as Pro with stronger prompt adherence on harder edits — fine-grained instructions, multiple changes in a single prompt, edits that need to reason about lighting or geometry. Roughly 2x the cost and slightly slower. safety_tolerance caps at 2 on the editing endpoint.

FLUX.1 Kontext [dev] — open-weight distilled variant served through Prodia’s fast pipeline. Edit-only (img2img). Exposes 17 style_preset values and explicit width, height, steps, and guidance_scale knobs in exchange for dropping the aspect_ratio field.

  • Localized edits with identity preservation — character changes, prop swaps, outfit changes, expression edits where everything off-prompt should stay frozen
  • Edits that would need a mask elsewhere — Kontext figures out the region from the prompt, so you don’t have to author or compute a mask
  • Style transfer with structure preservation — “make this a watercolour painting” or “render in pixel art” while keeping the original composition
  • High-aspect ratios for txt2img — the v1 endpoints support nine fixed ratios from 9:21 to 21:9; v2 accepts arbitrary W:H strings

For prompt-only generation without a reference image, FLUX.2 produces stronger photorealistic output. For natural-language editing on Google’s Gemini family, Nano Banana is a flat-rate alternative. For mask-based region replacement (rather than instruction-based editing), see the flux-fill.dev.v1 and SDXL inpainting job types.

FLUX.1 Kontext [pro]:

Job typeDescriptionETA
inference.flux-kontext.pro.txt2img.v2Generate an image from text~6s
inference.flux-kontext.pro.img2img.v2Edit an input image with a prompt~6s

FLUX.1 Kontext [max]:

Job typeDescriptionETA
inference.flux-kontext.max.txt2img.v2Generate an image from text~7s
inference.flux-kontext.max.img2img.v2Edit an input image with a prompt~7s

FLUX.1 Kontext [dev]:

Job typeDescriptionETA
inference.flux-fast.dev-kontext.img2img.v1Edit an input image with style-preset support~7s

Pro and Max — txt2img.v2 and img2img.v2:

  • prompt (required) — text description, 3–4,096 characters. For img2img this is the editing instruction
  • aspect_ratio — any string of the form W:H (e.g. 16:9, 3:2, 21:9). Defaults to the input aspect ratio for img2img
  • prompt_upsampling — boolean, default false. When true the model rewrites your prompt into a richer description before generating; useful for short prompts
  • safety_tolerance — integer. txt2img: 0–6, default 4. img2img: 0–2, default 2. Lower values apply stricter content moderation
  • seed — integer for reproducible output

Pro and Max — img2img inputs:

  • A single input image attached as the multipart input part. Accepted: PNG, JPEG, or WebP, 256–1920 pixels per side, max 10 MB

Dev — flux-fast.dev-kontext.img2img.v1:

  • prompt (required) — 3–4,096 characters
  • style_preset — one of 3d-model, analog-film, anime, cinematic, comic-book, craft-clay, digital-art, enhance, fantasy-art, isometric, line-art, low-poly, neon-punk, origami, photographic, pixel-art, texture
  • width and height — output dimensions, 512–1040 in multiples of 32, default 1024
  • steps — diffusion steps, 1–50, default 30
  • guidance_scale — classifier-free guidance, default 2.5
  • seed — integer for reproducible output
  • progressive — boolean, default false. When the response is JPEG, return a progressive JPEG
  • Describe the change, not the whole scene. For img2img write the diff — “replace the apple pie with a chocolate cake” — not the full description. The model already sees the input
  • Anchor preservation explicitly when a small change risks pulling the rest of the image with it: “…keep the wooden table, window, and lighting exactly the same”
  • Use prompt_upsampling for short prompts. It rewrites a five-word prompt into a richer description before generation. Skip it when you have already written the prompt you want
  • For txt2img, choose aspect_ratio deliberately. Defaults to 1:1. Set 9:16 or 16:9 rather than upscaling or cropping later
  • Reach for [max] on multi-step instructions. If a single prompt has two or three independent changes (“recolour the door, add a bicycle, and put leaves on the trees”), Max follows all three more reliably than Pro
  • Tighten safety_tolerance for user-facing apps0 or 1 for txt2img-from-user-input is a reasonable starting point

Apple pie generated with txt2img.v2:

FLUX.1 Kontext Pro txt2img — apple pie on a sunny kitchen table

The same image edited with img2img.v2 to swap the pie for a chocolate cake while preserving the kitchen, window, and lighting:

FLUX.1 Kontext Pro img2img — same scene with the pie replaced by a chocolate birthday cake

Text-to-image at a wide aspect ratio:

{
"type": "inference.flux-kontext.pro.txt2img.v2",
"config": {
"prompt": "A small, freshly baked apple pie sitting on a wooden kitchen table by a sunny window, golden flaky crust, warm afternoon light, soft natural shadows, photorealistic",
"aspect_ratio": "1:1",
"seed": 42
}
}

Instruction-based editing — replace the pie above with a chocolate cake while preserving the rest of the scene:

{
"type": "inference.flux-kontext.pro.img2img.v2",
"config": {
"prompt": "replace the apple pie with a chocolate birthday cake with white frosting and rainbow sprinkles, keep the wooden table, window, and lighting the same"
}
}

Multi-step instruction with the Max variant:

{
"type": "inference.flux-kontext.max.img2img.v2",
"config": {
"prompt": "change the season to winter, add fresh snow on the windowsill outside, dim the indoor lighting to dusk, and place a steaming mug of cocoa next to the cake",
"safety_tolerance": 1
}
}

Style-preset edit with the Dev variant:

{
"type": "inference.flux-fast.dev-kontext.img2img.v1",
"config": {
"prompt": "the same scene rendered as a hand-painted illustration",
"style_preset": "anime",
"width": 1024,
"height": 1024,
"steps": 30
}
}