Segmenting Images

The segmentation endpoint uses Meta’s Segment Anything models to detect and segment objects in an image. Unlike background removal which returns a single mask, segmentation returns multiple masks - one for each distinct object or region detected.

Two models are available:

SAM 2 (inference.segment.v1, also inference.sam2.segment.v1) — Automatic segmentation. Detects and masks all objects in the image without any prompt. Best for extracting every distinct region.
SAM 3 (inference.segment.v2, also inference.sam3.segment.v1) — Text-prompted segmentation. Describe what you want to segment in natural language, and only matching objects are returned. Best when you know what you’re looking for.

This is useful for:

Extracting individual objects from complex scenes
Creating object-level masks for further processing
Targeting specific objects by description (SAM 3)
Analyzing image composition

Automatic segmentation (SAM 2)

SAM 2 automatically detects and segments all objects in an image — no prompt needed.

import fs from "node:fs/promises";
import { createProdia } from "prodia/v2";

const prodia = createProdia({
  token: process.env.PRODIA_TOKEN,
});

// First generate an image to segment
console.log("Generating image...");
const imageJob = await prodia.job({
  type: "inference.flux-fast.schnell.txt2img.v1",
  config: {
    prompt: "a cute robot cat on a colorful background",
    resolution: "1024x1024",
  },
});

const imageBuffer = await imageJob.arrayBuffer();
await fs.writeFile("input.jpg", new Uint8Array(imageBuffer));
console.log("Saved input.jpg");

// Now segment it using SAM 2
console.log("Segmenting image...");
const segmentJob = await prodia.job(
  { type: "inference.segment.v1" },
  { accept: "multipart/form-data", inputs: [new Uint8Array(imageBuffer)] }
);

// Get all mask outputs
const formData = await segmentJob.formData();
const masks = formData.getAll("output");

for (const [i, mask] of masks.entries()) {
  const buffer = await mask.arrayBuffer();
  await fs.writeFile(`mask_${i}.png`, new Uint8Array(buffer));
}

console.log(`Saved ${masks.length} mask files`);

node main.js

Text-prompted segmentation (SAM 3)

SAM 3 lets you describe what to segment using a text prompt. Only objects matching the description are returned.

import fs from "node:fs/promises";
import { createProdia } from "prodia/v2";

const prodia = createProdia({
  token: process.env.PRODIA_TOKEN,
});

// Load an image to segment
const imageBuffer = await fs.readFile("input.jpg");

// Segment only the robot cat using SAM 3
console.log("Segmenting with prompt...");
const segmentJob = await prodia.job(
  {
    type: "inference.segment.v2",
    config: {
      prompt: "robot cat",
      confidence_threshold: 0.5,
    },
  },
  { accept: "multipart/form-data", inputs: [new Uint8Array(imageBuffer)] }
);

const formData = await segmentJob.formData();
const masks = formData.getAll("output");

for (const [i, mask] of masks.entries()) {
  const buffer = await mask.arrayBuffer();
  await fs.writeFile(`mask_${i}.png`, new Uint8Array(buffer));
}

console.log(`Saved ${masks.length} mask files`);

node main.js

SAM 3 parameters

Parameter	Type	Default	Description
`prompt`	string	(required)	Text describing what to segment (1–500 characters)
`confidence_threshold`	number	`0.5`	Confidence threshold (0.0–1.0). Lower values return more masks, higher values only return high-confidence matches.

Understanding the output

The segmentation endpoint returns a multipart response containing multiple PNG mask images. Each mask corresponds to a distinct object or region detected in the image:

White pixels (255) indicate the segmented object
Black pixels (0) indicate everything else

For SAM 2, the number of masks varies based on image complexity. For SAM 3, masks correspond to objects matching your text prompt.

Input requirements

Constraint	SAM 2	SAM 3
Accepted formats	PNG, JPEG, WebP	PNG, JPEG, WebP
Minimum dimensions	256 x 256	256 x 256
Maximum dimensions	2048 x 2048	4096 x 4096
Maximum file size	10 MB	10 MB