Inference API
The inference API provides an HTTP endpoint for running jobs to convert text to
images, an image to an image, text to a video, and
more. The jobs endpoint covers all types of
supported inference workloads through a unified IO interface and a single
/v2/job
endpoint. Job requests are made in a synchronous manner with the
input containing the job configuration along with the input data and the output
containing an updated job configuration with the output data.
Authentication
API requests require authentication via the Authorization
header
with a Bearer
token scheme:
Tokens are created on the API Dashboard and are required to use the API.
Base URL
The base URL for all requests should be: https://inference.prodia.com
POST /v2/job
Execute a Job
Jobs are executed by posting to the /v2/job
endpoint with an appropriate job
configuration and input data.
Job Configuration
All requests minimally have a JSON job
configuration that is differentiated on the type
field:
There are a variety of job types documented in the explorer and they have the following in common:
- All jobs have a
type
field that indicates which job type is being requested. - Jobs with configuration have a
config
field that contains the type specific job configuration.
For example, this is a job configuration for a FLUX.1 [dev] text to image generation:
Job Input Data
Some jobs need data that isn’t best transferred in the JSON format (e.g. binary
PNG data). This additional non-JSON input
to the job execution is sent as a
multipart/form-data
part named input
. Multiple input data parts may be specified as long as they
use different filenames. When sending input data the job configuration is sent
in the part named job
(and there must only be one part with this name).
For example, a FLUX.1 [dev] image to image generation requires an input image.
This would result in a multipart/form-data request with 2 parts (one for the
job configuration and another for the input image). Given the following job
configuration and an input.jpg
in the local directory:
Curl can be used to make the multipart request:
An HTTP trace of the request might look something like:
Job Result
All jobs return a job result which is a mirror of the original job configuration with additional information from the job execution. A job result includes all fields in the job configuration and the following:
created_at
is the UTC time the server created the jobupdated_at
is the UTC time the job result was last updatedexpires_at
is the UTC time after which the job is considered expiredid
is a UUID generated by the server to identify this jobstate
has a single fieldcurrent
indicating the final state of the jobmetrics
contains the elapsed inference time for the job and additional metrics when appropriate (e.g. iterations per second)error
is an error message present if the final state of the job is “failed”
The job result may also update the config
field to include default values
(e.g. random seed used) or even results themselves (e.g.
NSFW image class).
For example, using the job configuration above would render a job result similar to this:
Job Output Data
Similar to job input data, all jobs support returning a
multipart/form-data
response that includes the job result JSON as the job
part and output data as the output
parts.
An HTTP trace of such a response might look something like:
Content Negotiation
Request Body
The HTTP request format is specified via the request Content-Type
header. All
jobs can accept a multipart/form-data
request. If a job type doesn’t (or
optionally doesn’t) accept job input data then the Content-Type
can be set to
application/json
and the job configuration can be sent directly.
Response Body
The desired HTTP response format is specified via the request Accept
header.
All jobs can negotiate a multipart/form-data
response which works much like
job input data except that instead of input
parts it has
output
parts. If a job only outputs a single job output data file, then it
can be returned directly by setting the Accept
header to one of the supported
output formats for the job type.
When negotiating a multipart/form-data
response the default output content
type can be overridden by specifying a secondary type in the Accept
header.
For example Accept: multipart/form-data; image/png
would format the response
into a multipart/form-data
where the first output
part has Content-Type: image/png
.
Status Codes
200
OK
A 200
status code
indicates the job was completed successfully.
400
Bad Request
A 400
status code
indicates that the request was malformed. If possible the server will respond
with a job result with the state set to failed
and a message regarding the
error in the error
field.
401
Unauthorized
A 401
status code
indicates the request requires authentication.
403
Forbidden
A 403
status code
indicates that the authentication provided does not have
sufficient privileges for the request. This can happen if the job type requires
additional permissions.
429
Too Many Requests
A 429
status code indicates that there is no idle capacity available at
request time. This is a normal part of load management. The client should
retry the request after a delay specified in the Retry-After
response header. The Retry-After
header specifies the delay in seconds.
5xx
Server Errors
A status code in the 5xx
range
indicates a server error. These errors are typically transient and will be
resolved soon.