Async Inference

Overview

Async inference uses a fire-and-forget pattern for gateway requests: submit a normal inference payload to an async endpoint, get a job_id immediately, and poll later for the final result.

What You Get

You POST your normal inference payload to an async endpoint and immediately receive a job_id with status pending (HTTP 202). The request runs in the background, and you GET the same endpoint with that job_id to retrieve the result once it’s ready - useful for long-running or batch workloads where you don’t want to hold a connection open.

Supported Endpoints

Streaming is not supported on async endpoints.

Request Type	Submit (POST)	Poll (GET)
Text completions	`/v1/async/completions`	`/v1/async/completions/{job_id}`
Chat completions	`/v1/async/chat/completions`	`/v1/async/chat/completions/{job_id}`
Responses API	`/v1/async/responses`	`/v1/async/responses/{job_id}`
Embeddings	`/v1/async/embeddings`	`/v1/async/embeddings/{job_id}`
Speech	`/v1/async/audio/speech`	`/v1/async/audio/speech/{job_id}`
Transcriptions	`/v1/async/audio/transcriptions`	`/v1/async/audio/transcriptions/{job_id}`
Image generations	`/v1/async/images/generations`	`/v1/async/images/generations/{job_id}`
Image edits	`/v1/async/images/edits`	`/v1/async/images/edits/{job_id}`
Image variations	`/v1/async/images/variations`	`/v1/async/images/variations/{job_id}`
Rerank	`/v1/async/rerank`	`/v1/async/rerank/{job_id}`

Submitting a Request

Use the same JSON body as the synchronous endpoint, but switch to the /v1/async/ path.

curl -X POST https://app.deepintshield.com/v1/async/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-bf-vk: sk-bf-your-virtual-key" \
  -H "x-bf-async-job-result-ttl: 3600" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "Summarize the latest release notes in 3 bullets"
      }
    ]
  }'

Response (202 Accepted)

{
  "id": "1e89b165-d4fe-49e8-beb2-3e157f2df02f",
  "status": "pending",
  "created_at": "2026-02-19T08:10:17.831Z"
}

Polling for Results

Use GET on the matching endpoint with the returned job_id.

curl -X GET https://app.deepintshield.com/v1/async/chat/completions/1e89b165-d4fe-49e8-beb2-3e157f2df02f \
  -H "x-bf-vk: sk-bf-your-virtual-key"

Response codes:

202 Accepted: job is still pending or processing
200 OK: job is completed or failed

Pending example (202)

{
  "id": "1e89b165-d4fe-49e8-beb2-3e157f2df02f",
  "status": "pending",
  "created_at": "2026-02-19T08:10:17.831Z"
}

Completed example (200)

{
  "id": "1e89b165-d4fe-49e8-beb2-3e157f2df02f",
  "status": "completed",
  "created_at": "2026-02-19T08:10:17.831Z",
  "completed_at": "2026-02-19T08:10:19.412Z",
  "expires_at": "2026-02-19T09:10:19.412Z",
  "status_code": 200,
  "result": {
    "id": "chatcmpl-123",
    "object": "chat.completion"
  }
}

Failed example (200)

{
  "id": "1e89b165-d4fe-49e8-beb2-3e157f2df02f",
  "status": "failed",
  "created_at": "2026-02-19T08:10:17.831Z",
  "completed_at": "2026-02-19T08:10:19.412Z",
  "expires_at": "2026-02-19T09:10:19.412Z",
  "status_code": 429,
  "error": {
    "error": {
      "message": "rate limit exceeded",
      "type": "rate_limit_error"
    }
  }
}

Job Lifecycle

Status	Meaning
`pending`	Job is created and queued
`processing`	Job is running
`completed`	Operation succeeded and result is available
`failed`	Operation failed and an error is available

Result TTL and Expiration

Default TTL is 3600 seconds (1 hour).
TTL starts from completion time, not submission time.
Per-request override uses the x-bf-async-job-result-ttl header.
If the header is invalid or <= 0, the default TTL applies.
Expired jobs return 404 Job not found or expired.

Virtual Key Authorization

If a job is created with a virtual key, the job stores that virtual key identity.
Polling must use the same virtual key value.
Missing or mismatched virtual keys fail lookup and return 404 Job not found or expired.
Jobs created without a virtual key are not virtual-key scoped, so they can be polled by any caller that passes your gateway auth/middleware checks.

Observability

Async executions are logged like synchronous requests.
The logging metadata includes isAsyncRequest: true, which appears as an Async badge in the Logs UI.
The background run is a full DeepIntShield request, so governance, logging, cost tracking, and your other configured features all apply to the actual inference run.

Limitations

Streaming is not supported on async endpoints.
Requires a Logs Store to be configured.