Segmenting Images
The segmentation endpoint uses Meta’s Segment Anything models to detect and segment objects in an image. Unlike background removal which returns a single mask, segmentation returns multiple masks - one for each distinct object or region detected.
Two models are available:
- SAM 2 (
inference.segment.v1, alsoinference.sam2.segment.v1) — Automatic segmentation. Detects and masks all objects in the image without any prompt. Best for extracting every distinct region. - SAM 3 (
inference.segment.v2, alsoinference.sam3.segment.v1) — Text-prompted segmentation. Describe what you want to segment in natural language, and only matching objects are returned. Best when you know what you’re looking for.
This is useful for:
- Extracting individual objects from complex scenes
- Creating object-level masks for further processing
- Targeting specific objects by description (SAM 3)
- Analyzing image composition
Automatic segmentation (SAM 2)
Section titled “Automatic segmentation (SAM 2)”SAM 2 automatically detects and segments all objects in an image — no prompt needed.
import fs from "node:fs/promises";import { createProdia } from "prodia/v2";
const prodia = createProdia({ token: process.env.PRODIA_TOKEN,});
// First generate an image to segmentconsole.log("Generating image...");const imageJob = await prodia.job({ type: "inference.flux-fast.schnell.txt2img.v1", config: { prompt: "a cute robot cat on a colorful background", resolution: "1024x1024", },});
const imageBuffer = await imageJob.arrayBuffer();await fs.writeFile("input.jpg", new Uint8Array(imageBuffer));console.log("Saved input.jpg");
// Now segment it using SAM 2console.log("Segmenting image...");const segmentJob = await prodia.job( { type: "inference.segment.v1" }, { accept: "multipart/form-data", inputs: [new Uint8Array(imageBuffer)] });
// Get all mask outputsconst formData = await segmentJob.formData();const masks = formData.getAll("output");
for (const [i, mask] of masks.entries()) { const buffer = await mask.arrayBuffer(); await fs.writeFile(`mask_${i}.png`, new Uint8Array(buffer));}
console.log(`Saved ${masks.length} mask files`);node main.jsText-prompted segmentation (SAM 3)
Section titled “Text-prompted segmentation (SAM 3)”SAM 3 lets you describe what to segment using a text prompt. Only objects matching the description are returned.
import fs from "node:fs/promises";import { createProdia } from "prodia/v2";
const prodia = createProdia({ token: process.env.PRODIA_TOKEN,});
// Load an image to segmentconst imageBuffer = await fs.readFile("input.jpg");
// Segment only the robot cat using SAM 3console.log("Segmenting with prompt...");const segmentJob = await prodia.job( { type: "inference.segment.v2", config: { prompt: "robot cat", confidence_threshold: 0.5, }, }, { accept: "multipart/form-data", inputs: [new Uint8Array(imageBuffer)] });
const formData = await segmentJob.formData();const masks = formData.getAll("output");
for (const [i, mask] of masks.entries()) { const buffer = await mask.arrayBuffer(); await fs.writeFile(`mask_${i}.png`, new Uint8Array(buffer));}
console.log(`Saved ${masks.length} mask files`);node main.jsSAM 3 parameters
Section titled “SAM 3 parameters”| Parameter | Type | Default | Description |
|---|---|---|---|
prompt | string | (required) | Text describing what to segment (1–500 characters) |
confidence_threshold | number | 0.5 | Confidence threshold (0.0–1.0). Lower values return more masks, higher values only return high-confidence matches. |
Understanding the output
Section titled “Understanding the output”The segmentation endpoint returns a multipart response containing multiple PNG mask images. Each mask corresponds to a distinct object or region detected in the image:
- White pixels (255) indicate the segmented object
- Black pixels (0) indicate everything else
For SAM 2, the number of masks varies based on image complexity. For SAM 3, masks correspond to objects matching your text prompt.
Input requirements
Section titled “Input requirements”| Constraint | SAM 2 | SAM 3 |
|---|---|---|
| Accepted formats | PNG, JPEG, WebP | PNG, JPEG, WebP |
| Minimum dimensions | 256 x 256 | 256 x 256 |
| Maximum dimensions | 2048 x 2048 | 4096 x 4096 |
| Maximum file size | 10 MB | 10 MB |