Skip to content

inference.sam3.segment.v1

The inference.sam3.segment.v1 job performs text-prompted image segmentation using Meta’s SAM 3 (Segment Anything Model 3). Unlike SAM 2, SAM 3 accepts natural language text prompts to identify and segment specific objects in images.

{
"type": "inference.sam3.segment.v1",
"config": {
"prompt": "fish"
}
}

This returns one mask per detected instance matching the prompt.

ParameterTypeDefaultDescription
promptstring(required)Text describing what to segment (e.g., “yellow school bus”, “person”, “cat”)
confidence_thresholdnumber0.5Confidence threshold (0.0-1.0). Lower values return more masks, higher values only high-confidence masks
{
"type": "inference.sam3.segment.v1",
"config": {
"prompt": "fish"
}
}
{
"type": "inference.sam3.segment.v1",
"config": {
"prompt": "person",
"confidence_threshold": 0.9
}
}
{
"type": "inference.sam3.segment.v1",
"config": {
"prompt": "bird",
"confidence_threshold": 0.3
}
}
  • Format: PNG, JPEG, or WebP
  • Size: 256x256 minimum, 4096x4096 maximum
  • Max file size: 10MB

Returns one or more binary masks as PNG images. Each mask corresponds to a detected instance of the prompted object. Masks are grayscale images where:

  • White (255) = object pixels
  • Black (0) = background pixels

Tested on NVIDIA H100 80GB:

  • Model load time: ~8.3s
  • Average inference time: ~88ms per image
  • Memory usage: ~12GB VRAM
ThresholdTypical Result
0.3Many detections, may include false positives
0.5Balanced detection (default)
0.7Fewer, higher quality detections
0.9Only very confident detections
{
"type": "object",
"required": [
"type",
"config"
],
"additionalProperties": false,
"properties": {
"type": {
"enum": [
"inference.segment.v2",
"inference.sam3.segment.v1"
]
},
"config": {
"type": "object",
"required": [
"prompt"
],
"additionalProperties": false,
"properties": {
"prompt": {
"type": "string",
"minLength": 1,
"maxLength": 500,
"description": "Text prompt describing what to segment (e.g., 'yellow school bus', 'person', 'cat')."
},
"confidence_threshold": {
"type": "number",
"default": 0.5,
"minimum": 0,
"maximum": 1,
"description": "Confidence threshold for detections. Lower values return more masks, higher values only return high-confidence masks."
}
}
}
}
}