Skip to content

Inference API

The inference API provides an HTTP endpoint for running jobs to convert text to images, an image to an image, text to a video, and more. The jobs endpoint covers all types of supported inference workloads through a unified IO interface and a single /v2/job endpoint. Job requests are made in a synchronous manner with the input containing the job configuration along with the input data and the output containing an updated job configuration with the output data.

Authentication

API requests require authentication via the Authorization header with a Bearer token scheme:

Authorization: Bearer xxxxxxxxxxxx

Tokens are created on the API Dashboard and are required to use the API.

Base URL

The base URL for all requests should be: https://inference.prodia.com

POST /v2/job Execute a Job

Jobs are executed by posting to the /v2/job endpoint with an appropriate job configuration and input data.

Job Configuration

All requests minimally have a JSON job configuration that is differentiated on the type field:

job.json
{
"type": "inference.ping.v1"
}

There are a variety of job types documented in the explorer and they have the following in common:

  • All jobs have a type field that indicates which job type is being requested.
  • Jobs with configuration have a config field that contains the type specific job configuration.

For example, this is a job configuration for a FLUX.1 [dev] text to image generation:

job.json
{
"type": "inference.flux.dev.txt2img.v1",
"config": {
"prompt": "grainy photograph of a space explorer",
"loras": ["prodia/flux-lora-painting"]
}
}

Job Input Data

Some jobs need data that isn’t best transferred in the JSON format (e.g. binary PNG data). This additional non-JSON input to the job execution is sent as a multipart/form-data part named input. Multiple input data parts may be specified as long as they use different filenames. When sending input data the job configuration is sent in the part named job (and there must only be one part with this name).

For example, a FLUX.1 [dev] image to image generation requires an input image. This would result in a multipart/form-data request with 2 parts (one for the job configuration and another for the input image). Given the following job configuration and an input.jpg in the local directory:

job.json
{
"type": "inference.flux.dev.img2img.v1",
"config": {
"prompt": "grainy photograph of a space explorer",
"loras": ["prodia/flux-lora-painting"]
}
}

Curl can be used to make the multipart request:

curl FLUX.1 [dev] image to image
curl -H "Authorization: Bearer $PRODIA_TOKEN" \
-H 'Accept: image/jpeg' \
--output output.jpg \
'https://inference.prodia.com/v2/job'

An HTTP trace of the request might look something like:

POST /v2/job HTTP/2
Host: inference.prodia.com
User-Agent: curl/8.11.1
Accept: image/jpeg
Authorization: Bearer $PRODIA_TOKEN
Content-Type: multipart/form-data; boundary=------------------------fYf8kg9C7fC50PWcYjMuWt
--------------------------fYf8kg9C7fC50PWcYjMuWt
Content-Disposition: form-data; name="job"; filename="job.json"
Content-Type: application/octet-stream
{
"type": "inference.flux.dev.img2img.v1",
"config": {
"prompt": "grainy photograph of a space explorer",
"loras": ["prodia/flux-lora-painting"]
}
}
--------------------------fYf8kg9C7fC50PWcYjMuWt
Content-Disposition: form-data; name="input"; filename="input.jpg"
Content-Type: image/jpeg
......JFIF......
...more bytes...
--------------------------fYf8kg9C7fC50PWcYjMuWt--

Job Result

All jobs return a job result which is a mirror of the original job configuration with additional information from the job execution. A job result includes all fields in the job configuration and the following:

  • created_at is the UTC time the server created the job
  • updated_at is the UTC time the job result was last updated
  • expires_at is the UTC time after which the job is considered expired
  • id is a UUID generated by the server to identify this job
  • state has a single field current indicating the final state of the job
  • metrics contains the elapsed inference time for the job and additional metrics when appropriate (e.g. iterations per second)
  • error is an error message present if the final state of the job is “failed”

The job result may also update the config field to include default values (e.g. random seed used) or even results themselves (e.g. NSFW image class).

For example, using the job configuration above would render a job result similar to this:

job.json
{
"type": "inference.flux.dev.img2img.v1",
"created_at": "2025-01-01T00:00:14.885Z",
"updated_at": "2025-01-01T00:00:20.11Z",
"expires_at": "2025-01-01T00:01:14.885Z",
"id": "c83d7027-240a-484a-aeba-96014e568711",
"state": {
"current": "completed"
},
"config": {
"prompt": "grainy photograph of a space explorer",
"loras": ["prodia/flux-lora-painting"]
},
"metrics": {
"elapsed": 5.138920783996582,
"ips": 4.864834670706344
}
}

Job Output Data

Similar to job input data, all jobs support returning a multipart/form-data response that includes the job result JSON as the job part and output data as the output parts.

An HTTP trace of such a response might look something like:

HTTP/2 200
content-type: multipart/form-data; boundary=b3d4fb976dce6e5d036fa7fb3da645bcfcc56384abb2cb618f8c8bdfd360
x-request-id: 835b0174-5a7c-45e0-8256-050d0f2c47a1
--b3d4fb976dce6e5d036fa7fb3da645bcfcc56384abb2cb618f8c8bdfd360
Content-Disposition: form-data; name="job"; filename="job.json"
Content-Type: application/json
{
"type": "inference.flux.dev.img2img.v1",
"created_at": "2025-01-01T00:00:14.885Z",
"updated_at": "2025-01-01T00:00:20.11Z",
"expires_at": "2025-01-01T00:01:14.885Z",
"id": "c83d7027-240a-484a-aeba-96014e568711",
"state": {
"current": "completed"
},
"config": {
"prompt": "grainy photograph of a space explorer",
"loras": [
"prodia/flux-lora-painting"
]
},
"metrics": {
"elapsed": 5.138920783996582,
"ips": 4.864834670706344
}
}
--b3d4fb976dce6e5d036fa7fb3da645bcfcc56384abb2cb618f8c8bdfd360
Content-Disposition: form-data; name="output"; filename="f9774fd02a85-8985-4d2a-a093-24bda78322b7"
Content-Type: image/jpeg
......JFIF......
...more bytes...
--b3d4fb976dce6e5d036fa7fb3da645bcfcc56384abb2cb618f8c8bdfd360--

Content Negotiation

Request Body

The HTTP request format is specified via the request Content-Type header. All jobs can accept a multipart/form-data request. If a job type doesn’t (or optionally doesn’t) accept job input data then the Content-Type can be set to application/json and the job configuration can be sent directly.

Response Body

The desired HTTP response format is specified via the request Accept header. All jobs can negotiate a multipart/form-data response which works much like job input data except that instead of input parts it has output parts. If a job only outputs a single job output data file, then it can be returned directly by setting the Accept header to one of the supported output formats for the job type.

When negotiating a multipart/form-data response the default output content type can be overridden by specifying a secondary type in the Accept header. For example Accept: multipart/form-data; image/png would format the response into a multipart/form-data where the first output part has Content-Type: image/png.

Status Codes

200 OK

A 200 status code indicates the job was completed successfully.

400 Bad Request

A 400 status code indicates that the request was malformed. If possible the server will respond with a job result with the state set to failed and a message regarding the error in the error field.

401 Unauthorized

A 401 status code indicates the request requires authentication.

403 Forbidden

A 403 status code indicates that the authentication provided does not have sufficient privileges for the request. This can happen if the job type requires additional permissions.

429 Too Many Requests

A 429 status code indicates that there is no idle capacity available at request time. This is a normal part of load management. The client should retry the request after a delay specified in the Retry-After response header. The Retry-After header specifies the delay in seconds.

5xx Server Errors

A status code in the 5xx range indicates a server error. These errors are typically transient and will be resolved soon.