Skip to content

Segmenting Images

The segmentation endpoint uses Meta’s Segment Anything models to detect and segment objects in an image. Unlike background removal which returns a single mask, segmentation returns multiple masks - one for each distinct object or region detected.

Two models are available:

  • SAM 2 (inference.segment.v1, also inference.sam2.segment.v1) — Automatic segmentation. Detects and masks all objects in the image without any prompt. Best for extracting every distinct region.
  • SAM 3 (inference.segment.v2, also inference.sam3.segment.v1) — Text-prompted segmentation. Describe what you want to segment in natural language, and only matching objects are returned. Best when you know what you’re looking for.

This is useful for:

  • Extracting individual objects from complex scenes
  • Creating object-level masks for further processing
  • Targeting specific objects by description (SAM 3)
  • Analyzing image composition

SAM 2 automatically detects and segments all objects in an image — no prompt needed.

main.js
import fs from "node:fs/promises";
import { createProdia } from "prodia/v2";
const prodia = createProdia({
token: process.env.PRODIA_TOKEN,
});
// First generate an image to segment
console.log("Generating image...");
const imageJob = await prodia.job({
type: "inference.flux-fast.schnell.txt2img.v1",
config: {
prompt: "a cute robot cat on a colorful background",
resolution: "1024x1024",
},
});
const imageBuffer = await imageJob.arrayBuffer();
await fs.writeFile("input.jpg", new Uint8Array(imageBuffer));
console.log("Saved input.jpg");
// Now segment it using SAM 2
console.log("Segmenting image...");
const segmentJob = await prodia.job(
{ type: "inference.segment.v1" },
{ accept: "multipart/form-data", inputs: [new Uint8Array(imageBuffer)] }
);
// Get all mask outputs
const formData = await segmentJob.formData();
const masks = formData.getAll("output");
for (const [i, mask] of masks.entries()) {
const buffer = await mask.arrayBuffer();
await fs.writeFile(`mask_${i}.png`, new Uint8Array(buffer));
}
console.log(`Saved ${masks.length} mask files`);
Terminal window
node main.js

SAM 3 lets you describe what to segment using a text prompt. Only objects matching the description are returned.

main.js
import fs from "node:fs/promises";
import { createProdia } from "prodia/v2";
const prodia = createProdia({
token: process.env.PRODIA_TOKEN,
});
// Load an image to segment
const imageBuffer = await fs.readFile("input.jpg");
// Segment only the robot cat using SAM 3
console.log("Segmenting with prompt...");
const segmentJob = await prodia.job(
{
type: "inference.segment.v2",
config: {
prompt: "robot cat",
confidence_threshold: 0.5,
},
},
{ accept: "multipart/form-data", inputs: [new Uint8Array(imageBuffer)] }
);
const formData = await segmentJob.formData();
const masks = formData.getAll("output");
for (const [i, mask] of masks.entries()) {
const buffer = await mask.arrayBuffer();
await fs.writeFile(`mask_${i}.png`, new Uint8Array(buffer));
}
console.log(`Saved ${masks.length} mask files`);
Terminal window
node main.js
ParameterTypeDefaultDescription
promptstring(required)Text describing what to segment (1–500 characters)
confidence_thresholdnumber0.5Confidence threshold (0.0–1.0). Lower values return more masks, higher values only return high-confidence matches.

The segmentation endpoint returns a multipart response containing multiple PNG mask images. Each mask corresponds to a distinct object or region detected in the image:

  • White pixels (255) indicate the segmented object
  • Black pixels (0) indicate everything else

For SAM 2, the number of masks varies based on image complexity. For SAM 3, masks correspond to objects matching your text prompt.

ConstraintSAM 2SAM 3
Accepted formatsPNG, JPEG, WebPPNG, JPEG, WebP
Minimum dimensions256 x 256256 x 256
Maximum dimensions2048 x 20484096 x 4096
Maximum file size10 MB10 MB