inference.sam3.segment.v1

The inference.sam3.segment.v1 job performs text-prompted image segmentation using Meta’s SAM 3 (Segment Anything Model 3). Unlike SAM 2, SAM 3 accepts natural language text prompts to identify and segment specific objects in images.

Basic Usage

{
    "type": "inference.sam3.segment.v1",
    "config": {
        "prompt": "fish"
    }
}

This returns one mask per detected instance matching the prompt.

Configuration Options

Parameter	Type	Default	Description
`prompt`	string	(required)	Text describing what to segment (e.g., “yellow school bus”, “person”, “cat”)
`confidence_threshold`	number	0.5	Confidence threshold (0.0-1.0). Lower values return more masks, higher values only high-confidence masks

Examples

Segment all fish in an image

{
    "type": "inference.sam3.segment.v1",
    "config": {
        "prompt": "fish"
    }
}

High confidence detection only

{
    "type": "inference.sam3.segment.v1",
    "config": {
        "prompt": "person",
        "confidence_threshold": 0.9
    }
}

Low confidence for more detections

{
    "type": "inference.sam3.segment.v1",
    "config": {
        "prompt": "bird",
        "confidence_threshold": 0.3
    }
}

Input Requirements

Format: PNG, JPEG, or WebP
Size: 256x256 minimum, 4096x4096 maximum
Max file size: 10MB

Output

Returns one or more binary masks as PNG images. Each mask corresponds to a detected instance of the prompted object. Masks are grayscale images where:

White (255) = object pixels
Black (0) = background pixels

Performance

Tested on NVIDIA H100 80GB:

Model load time: ~8.3s
Average inference time: ~88ms per image
Memory usage: ~12GB VRAM

Confidence Threshold Effects

Threshold	Typical Result
0.3	Many detections, may include false positives
0.5	Balanced detection (default)
0.7	Fewer, higher quality detections
0.9	Only very confident detections

Schema

{
  "type": "object",
  "required": [
    "type",
    "config"
  ],
  "additionalProperties": false,
  "properties": {
    "type": {
      "enum": [
        "inference.segment.v2",
        "inference.sam3.segment.v1"
      ]
    },
    "config": {
      "type": "object",
      "required": [
        "prompt"
      ],
      "additionalProperties": false,
      "properties": {
        "prompt": {
          "type": "string",
          "minLength": 1,
          "maxLength": 500,
          "description": "Text prompt describing what to segment (e.g., 'yellow school bus', 'person', 'cat')."
        },
        "confidence_threshold": {
          "type": "number",
          "default": 0.5,
          "minimum": 0,
          "maximum": 1,
          "description": "Confidence threshold for detections. Lower values return more masks, higher values only return high-confidence masks."
        }
      }
    }
  }
}