Segmenting Videos
The video segmentation endpoint uses Meta’s SAM 3 Video Predictor to detect and track objects across an mp4. You provide a single text prompt describing what to segment — SAM 3 finds matching objects in the first frame and tracks them forwards and backwards through the video. The output is a single mp4 with the same resolution and fps as the input.
The job type is inference.sam3.segment.video.v1.
This is useful for:
- Creating per-frame object masks for VFX or rotoscoping
- Tracking subjects across a clip without manual keyframing
- Generating overlays that visualize what a model is “seeing” in a video
The examples below use a sample reef clip hosted at docs.prodia.com/fish.mp4. Swap in your own mp4 by changing the input path.
Mask mode (default)
Section titled “Mask mode (default)”The default mask mode returns a black-and-white mp4 — white pixels mark any frame region covered by a detected object, everything else is black.
import fs from "node:fs/promises";import { createProdia } from "prodia/v2";
const prodia = createProdia({ token: process.env.PRODIA_TOKEN,});
const inputBuffer = await (await fetch("https://docs.prodia.com/fish.mp4")).arrayBuffer();
console.log("Segmenting video...");const job = await prodia.job( { type: "inference.sam3.segment.video.v1", config: { prompt: "fish", }, }, { accept: "video/mp4", inputs: [new Uint8Array(inputBuffer)] });
const video = await job.arrayBuffer();await fs.writeFile("mask.mp4", new Uint8Array(video));console.log("Saved mask.mp4");node main.jsOverlay mode
Section titled “Overlay mode”overlay mode composites the colored masks, bounding boxes, and id=<N>, p=<score> labels from the SAM 3 visualization over the original video. This matches the SAM 3 README example output and is useful for previewing what the model detected.
import fs from "node:fs/promises";import { createProdia } from "prodia/v2";
const prodia = createProdia({ token: process.env.PRODIA_TOKEN,});
const inputBuffer = await (await fetch("https://docs.prodia.com/fish.mp4")).arrayBuffer();
const job = await prodia.job( { type: "inference.sam3.segment.video.v1", config: { prompt: "fish", mode: "overlay", alpha: 0.5, }, }, { accept: "video/mp4", inputs: [new Uint8Array(inputBuffer)] });
const video = await job.arrayBuffer();await fs.writeFile("overlay.mp4", new Uint8Array(video));node main.jsFiltering low-confidence detections
Section titled “Filtering low-confidence detections”Raise confidence_threshold to suppress weakly-matched objects. SAM 3 attaches a per-object score to every detection; objects below the threshold are dropped before the mask is merged or rendered.
const job = await prodia.job( { type: "inference.sam3.segment.video.v1", config: { prompt: "fish", confidence_threshold: 0.9, }, }, { accept: "video/mp4", inputs: [new Uint8Array(inputBuffer)] });Parameters
Section titled “Parameters”| Parameter | Type | Default | Description |
|---|---|---|---|
prompt | string | (required) | Text describing what to segment and track across the video (1–500 characters). |
confidence_threshold | number | 0.5 | Minimum SAM 3 score an object must reach to be kept (0.0–1.0). Lower values keep more detections. |
mode | enum | mask | mask returns a merged black-and-white mp4. overlay returns the colored SAM 3 visualization composited over the input. |
alpha | number | 0.5 | Mask alpha used when mode is overlay. Ignored otherwise (0.0–1.0). |
Input requirements
Section titled “Input requirements”| Constraint | Value |
|---|---|
| Accepted formats | MP4 (video/mp4) |
| Maximum file size | 100 MB |
Input resolution and fps are preserved on the output. Common sizes (832×480, 1280×720) are warmed into the torch.compile cache on bootstrap, so they encode the fastest.