Async Inference
Overview
Section titled “Overview”Async inference uses a fire-and-forget pattern for gateway requests: submit a normal inference payload to an async endpoint, get a job_id immediately, and poll later for the final result.
What You Get
Section titled “What You Get”You POST your normal inference payload to an async endpoint and immediately receive a job_id with status pending (HTTP 202). The request runs in the background, and you GET the same endpoint with that job_id to retrieve the result once it’s ready - useful for long-running or batch workloads where you don’t want to hold a connection open.
Supported Endpoints
Section titled “Supported Endpoints”Streaming is not supported on async endpoints.
| Request Type | Submit (POST) | Poll (GET) |
|---|---|---|
| Text completions | /v1/async/completions | /v1/async/completions/{job_id} |
| Chat completions | /v1/async/chat/completions | /v1/async/chat/completions/{job_id} |
| Responses API | /v1/async/responses | /v1/async/responses/{job_id} |
| Embeddings | /v1/async/embeddings | /v1/async/embeddings/{job_id} |
| Speech | /v1/async/audio/speech | /v1/async/audio/speech/{job_id} |
| Transcriptions | /v1/async/audio/transcriptions | /v1/async/audio/transcriptions/{job_id} |
| Image generations | /v1/async/images/generations | /v1/async/images/generations/{job_id} |
| Image edits | /v1/async/images/edits | /v1/async/images/edits/{job_id} |
| Image variations | /v1/async/images/variations | /v1/async/images/variations/{job_id} |
| Rerank | /v1/async/rerank | /v1/async/rerank/{job_id} |
Submitting a Request
Section titled “Submitting a Request”Use the same JSON body as the synchronous endpoint, but switch to the /v1/async/ path.
curl -X POST https://app.deepintshield.com/v1/async/chat/completions \ -H "Content-Type: application/json" \ -H "x-bf-vk: sk-bf-your-virtual-key" \ -H "x-bf-async-job-result-ttl: 3600" \ -d '{ "model": "openai/gpt-4o-mini", "messages": [ { "role": "user", "content": "Summarize the latest release notes in 3 bullets" } ] }'Response (202 Accepted)
{ "id": "1e89b165-d4fe-49e8-beb2-3e157f2df02f", "status": "pending", "created_at": "2026-02-19T08:10:17.831Z"}Polling for Results
Section titled “Polling for Results”Use GET on the matching endpoint with the returned job_id.
curl -X GET https://app.deepintshield.com/v1/async/chat/completions/1e89b165-d4fe-49e8-beb2-3e157f2df02f \ -H "x-bf-vk: sk-bf-your-virtual-key"Response codes:
202 Accepted: job is stillpendingorprocessing200 OK: job iscompletedorfailed
Pending example (202)
{ "id": "1e89b165-d4fe-49e8-beb2-3e157f2df02f", "status": "pending", "created_at": "2026-02-19T08:10:17.831Z"}Completed example (200)
{ "id": "1e89b165-d4fe-49e8-beb2-3e157f2df02f", "status": "completed", "created_at": "2026-02-19T08:10:17.831Z", "completed_at": "2026-02-19T08:10:19.412Z", "expires_at": "2026-02-19T09:10:19.412Z", "status_code": 200, "result": { "id": "chatcmpl-123", "object": "chat.completion" }}Failed example (200)
{ "id": "1e89b165-d4fe-49e8-beb2-3e157f2df02f", "status": "failed", "created_at": "2026-02-19T08:10:17.831Z", "completed_at": "2026-02-19T08:10:19.412Z", "expires_at": "2026-02-19T09:10:19.412Z", "status_code": 429, "error": { "error": { "message": "rate limit exceeded", "type": "rate_limit_error" } }}Job Lifecycle
Section titled “Job Lifecycle”| Status | Meaning |
|---|---|
pending | Job is created and queued |
processing | Job is running |
completed | Operation succeeded and result is available |
failed | Operation failed and an error is available |
Result TTL and Expiration
Section titled “Result TTL and Expiration”- Default TTL is 3600 seconds (1 hour).
- TTL starts from completion time, not submission time.
- Per-request override uses the
x-bf-async-job-result-ttlheader. - If the header is invalid or
<= 0, the default TTL applies. - Expired jobs return
404 Job not found or expired.
Virtual Key Authorization
Section titled “Virtual Key Authorization”- If a job is created with a virtual key, the job stores that virtual key identity.
- Polling must use the same virtual key value.
- Missing or mismatched virtual keys fail lookup and return
404 Job not found or expired. - Jobs created without a virtual key are not virtual-key scoped, so they can be polled by any caller that passes your gateway auth/middleware checks.
Observability
Section titled “Observability”- Async executions are logged like synchronous requests.
- The logging metadata includes
isAsyncRequest: true, which appears as an Async badge in the Logs UI. - The background run is a full DeepIntShield request, so governance, logging, cost tracking, and your other configured features all apply to the actual inference run.
Limitations
Section titled “Limitations”- Streaming is not supported on async endpoints.
- Requires a Logs Store to be configured.