Skip to content

Segmenting Videos

The video segmentation endpoint uses Meta’s SAM 3 Video Predictor to detect and track objects across an mp4. You provide a single text prompt describing what to segment — SAM 3 finds matching objects in the first frame and tracks them forwards and backwards through the video. The output is a single mp4 with the same resolution and fps as the input.

The job type is inference.sam3.segment.video.v1.

This is useful for:

  • Creating per-frame object masks for VFX or rotoscoping
  • Tracking subjects across a clip without manual keyframing
  • Generating overlays that visualize what a model is “seeing” in a video

The examples below use a sample reef clip hosted at docs.prodia.com/fish.mp4. Swap in your own mp4 by changing the input path.

The default mask mode returns a black-and-white mp4 — white pixels mark any frame region covered by a detected object, everything else is black.

main.js
import fs from "node:fs/promises";
import { createProdia } from "prodia/v2";
const prodia = createProdia({
token: process.env.PRODIA_TOKEN,
});
const inputBuffer = await (await fetch("https://docs.prodia.com/fish.mp4")).arrayBuffer();
console.log("Segmenting video...");
const job = await prodia.job(
{
type: "inference.sam3.segment.video.v1",
config: {
prompt: "fish",
},
},
{ accept: "video/mp4", inputs: [new Uint8Array(inputBuffer)] }
);
const video = await job.arrayBuffer();
await fs.writeFile("mask.mp4", new Uint8Array(video));
console.log("Saved mask.mp4");
Terminal window
node main.js

overlay mode composites the colored masks, bounding boxes, and id=<N>, p=<score> labels from the SAM 3 visualization over the original video. This matches the SAM 3 README example output and is useful for previewing what the model detected.

main.js
import fs from "node:fs/promises";
import { createProdia } from "prodia/v2";
const prodia = createProdia({
token: process.env.PRODIA_TOKEN,
});
const inputBuffer = await (await fetch("https://docs.prodia.com/fish.mp4")).arrayBuffer();
const job = await prodia.job(
{
type: "inference.sam3.segment.video.v1",
config: {
prompt: "fish",
mode: "overlay",
alpha: 0.5,
},
},
{ accept: "video/mp4", inputs: [new Uint8Array(inputBuffer)] }
);
const video = await job.arrayBuffer();
await fs.writeFile("overlay.mp4", new Uint8Array(video));
Terminal window
node main.js

Raise confidence_threshold to suppress weakly-matched objects. SAM 3 attaches a per-object score to every detection; objects below the threshold are dropped before the mask is merged or rendered.

main.js
const job = await prodia.job(
{
type: "inference.sam3.segment.video.v1",
config: {
prompt: "fish",
confidence_threshold: 0.9,
},
},
{ accept: "video/mp4", inputs: [new Uint8Array(inputBuffer)] }
);
ParameterTypeDefaultDescription
promptstring(required)Text describing what to segment and track across the video (1–500 characters).
confidence_thresholdnumber0.5Minimum SAM 3 score an object must reach to be kept (0.0–1.0). Lower values keep more detections.
modeenummaskmask returns a merged black-and-white mp4. overlay returns the colored SAM 3 visualization composited over the input.
alphanumber0.5Mask alpha used when mode is overlay. Ignored otherwise (0.0–1.0).
ConstraintValue
Accepted formatsMP4 (video/mp4)
Maximum file size100 MB

Input resolution and fps are preserved on the output. Common sizes (832×480, 1280×720) are warmed into the torch.compile cache on bootstrap, so they encode the fastest.