Skip to content

Replicate

Replicate is a prediction-based platform where every request creates a “prediction” that runs asynchronously. Each model on Replicate defines its own input schema, making it highly flexible but requiring model-specific parameter knowledge. Through the DeepIntShield gateway you send standard OpenAI-style requests and receive standard responses; pass model-specific fields with extra_params.

OperationNon-StreamingStreamingEndpoint
Chat Completions/v1/predictions
Responses API/v1/predictions
Text Completions/v1/predictions
Image Generation/v1/predictions
Image Edit/v1/predictions
Video Generation-/v1/predictions
Image Variation-
Files-/v1/files
List Models-/v1/deployments
Embeddings-
Speech (TTS)-
Transcriptions (STT)-
Batch-

Replicate models can be specified in three ways:

Terminal window
curl -X POST https://app.deepintshield.com/v1/chat/completions \
-H "Authorization: Bearer sk-bf-your-virtual-key" \
-H "Content-Type: application/json" \
-d '{
"model": "replicate/5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
"messages": [{"role": "user", "content": "Hello"}]
}'

Format: owner/model-name

Terminal window
curl -X POST https://app.deepintshield.com/v1/chat/completions \
-H "Authorization: Bearer sk-bf-your-virtual-key" \
-H "Content-Type: application/json" \
-d '{
"model": "replicate/meta/llama-2-7b-chat",
"messages": [{"role": "user", "content": "Hello"}]
}'

Configure deployed models in the Replicate key configuration. Deployments map custom model identifiers to actual deployment paths.

Configuration Example:

{
"provider": "replicate",
"value": "your-api-key",
"replicate_key_config": {
"deployments": {
"my-model": "owner/my-deployment-name"
}
}
}

Usage:

Terminal window
curl -X POST https://app.deepintshield.com/v1/chat/completions \
-H "Authorization: Bearer sk-bf-your-virtual-key" \
-H "Content-Type: application/json" \
-d '{
"model": "replicate/my-model",
"messages": [{"role": "user", "content": "Hello"}]
}'

DeepIntShield uses sync mode when the Prefer: wait header is present in the request. The request blocks until the prediction completes or times out (default 60 seconds), then returns the result directly. If the timeout expires, the gateway falls back to polling.

This is the default mode for Replicate predictions. DeepIntShield automatically polls the prediction until it completes, so you receive the final result in a single response.

Status Flow: startingprocessingsucceeded/failed/canceled


Send a standard chat request. System messages are supported, and image URLs in message content are passed through to the model.

Important: Not all Replicate models support a dedicated system prompt field. For unsupported models, the system prompt is automatically prepended to the conversation prompt.

Models without system prompt support:

  • meta/meta-llama-3-8b
  • meta/llama-2-70b
  • openai/gpt-oss-20b
  • openai/o1-mini
  • xai/grok-4
  • All deepseek-ai/deepseek* models (e.g., deepseek-r1, deepseek-v3)

Pass model-specific parameters directly in the request body. Fields outside the standard schema are forwarded to the model:

Terminal window
curl -X POST https://app.deepintshield.com/v1/chat/completions \
-H "Authorization: Bearer sk-bf-your-virtual-key" \
-H "Content-Type: application/json" \
-d '{
"model": "replicate/meta/llama-2-7b-chat",
"messages": [{"role": "user", "content": "Hello"}],
"temperature": 0.7,
"top_k": 50,
"repetition_penalty": 1.1,
"min_new_tokens": 10
}'
{
"id": "abc123",
"model": "meta/llama-2-7b-chat",
"object": "chat.completion",
"created": 1234567890,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 8,
"total_tokens": 18
}
}

Set "stream": true to receive incremental content as Server-Sent Events, ending with a final chunk carrying finish_reason. A canceled or failed prediction is surfaced as a stream error.


Replicate supports the OpenAI-style Responses API, with the same parameter handling and system-prompt behavior as Chat Completions. For OpenAI gpt-5-structured models, native Responses features (input_item_list, tools, json_schema) are available.

Responses follow standard Responses API format with status mapping:

Replicate StatusResponses Status
succeededcompleted
failedfailed
canceledcancelled
processingin_progress
startingqueued

Send a prompt to the legacy completions endpoint. Pass model-specific fields such as top_k directly in the request body.

Terminal window
curl -X POST https://app.deepintshield.com/v1/completions \
-H "Authorization: Bearer sk-bf-your-virtual-key" \
-H "Content-Type: application/json" \
-d '{
"model": "replicate/meta/llama-2-7b",
"prompt": "Once upon a time",
"max_tokens": 100,
"temperature": 0.8,
"top_k": 40
}'

Responses use the standard completions format, with text in choices[0].text and usage metrics in usage.


The following standard parameters are supported: prompt, n, aspect_ratio, resolution, output_format, quality, background, seed, negative_prompt, num_inference_steps, and input_images.

Different Replicate models expect input images in different fields. DeepIntShield automatically sends your image(s) to the correct field based on the model, so you can always supply them via input_images:

Model familyNotes
black-forest-labs/flux-1.1-pro, flux-1.1-pro-ultra, flux-pro, flux-1.1-pro-ultra-finetunedSingle image
black-forest-labs/flux-kontext-pro, flux-kontext-max, flux-kontext-devSingle image
black-forest-labs/flux-dev, flux-fill-pro, flux-dev-lora, flux-krea-devSingle image
All other modelsMultiple images
Terminal window
curl -X POST https://app.deepintshield.com/v1/images/generations \
-H "Authorization: Bearer sk-bf-your-virtual-key" \
-H "Content-Type: application/json" \
-d '{
"model": "replicate/black-forest-labs/flux-schnell",
"prompt": "A serene mountain landscape at sunset",
"aspect_ratio": "16:9",
"output_format": "webp",
"num_inference_steps": 4,
"seed": 42
}'

Generated images are returned in data[] as URLs (data[].url) or, for some models, base64 data URIs.

{
"id": "xyz789",
"created": 1234567890,
"model": "black-forest-labs/flux-schnell",
"data": [
{
"url": "https://replicate.delivery/pbxt/...",
"index": 0
}
],
"usage": {
"input_tokens": 15,
"output_tokens": 0,
"total_tokens": 15
}
}

Image generation streaming provides progressive image updates. Each chunk carries a partial image as a data URI, and a final completion chunk signals the finished image.


Image edit runs as a prediction like image generation. You send one or more input images plus a prompt; the model returns edited image(s). The same input-image behavior as Image Generation applies - supply images via the request and the gateway routes them to the model’s expected field.

Endpoint: /v1/images/edits

ParameterNotes
image[]One or more input images
promptEdit instruction
nNumber of images
output_formatOutput image format
qualityOutput quality
backgroundBackground handling
seedSeed for reproducibility
negative_promptWhat to avoid
num_inference_stepsInference steps

Model-specific fields can be passed directly and are forwarded to the model.

Terminal window
curl -X POST 'https://app.deepintshield.com/v1/images/edits' \
--header 'Authorization: Bearer sk-bf-your-virtual-key' \
--form 'model="replicate/black-forest-labs/flux-fill-pro"' \
--form 'image[]=@"image.png"' \
--form 'prompt="Replace the sky with a starry night"' \
--form 'mask=@"mask.png"'

Same as Image Generation: edited images are returned in data[] as URLs (data[].url) or base64 data URIs (data[].b64_json).

Image edit streaming is supported. Partial chunks (type: "image_edit.partial_image") stream until a final type: "image_edit.completed" chunk with the finished image and usage. Use Prefer: wait for sync behavior or rely on polling (async) like other Replicate predictions.


Replicate’s Files API supports uploading, listing, and managing files for use in predictions.

Request: Multipart form-data

FieldTypeRequiredNotes
filebinaryFile content
filenamestringCustom filename
content_typestringMIME type (auto-detected from extension)

Example:

Terminal window
curl -X POST https://app.deepintshield.com/v1/files \
-H "Authorization: Bearer sk-bf-your-virtual-key" \
-F "file=@document.pdf" \
-F "filename=my-document.pdf"

Response:

{
"id": "file_abc123",
"object": "file",
"bytes": 12345,
"created_at": 1234567890,
"filename": "my-document.pdf",
"purpose": "batch",
"status": "processed"
}

Query Parameters:

ParameterTypeNotes
limitintResults per page
afterstringPagination cursor

Example:

Terminal window
curl -X GET "https://app.deepintshield.com/v1/files?limit=20" \
-H "Authorization: Bearer sk-bf-your-virtual-key"

Pagination is cursor-based; use the after cursor to fetch the next page.

Operations:

  • GET /v1/files/{file_id} - Retrieve file metadata
  • DELETE /v1/files/{file_id} - Delete file

Required Parameters:

ParameterTypeDescription
ownerstringFile owner username
expiryint64Unix timestamp for expiration
signaturestringBase64-encoded HMAC-SHA256 signature

Signature Format: HMAC-SHA256 of "{owner} {file_id} {expiry}" using the Files API signing secret.

Example:

Terminal window
curl -X POST https://app.deepintshield.com/v1/files/file_abc123/content \
-H "Authorization: Bearer sk-bf-your-virtual-key" \
-H "Content-Type: application/json" \
-d '{
"owner": "my-username",
"expiry": 1735689600,
"signature": "base64-encoded-signature"
}'

Endpoint: /v1/models

Deployments are private or organization models with dedicated infrastructure. The response includes:

{
"data": [
{
"id": "replicate/my-org/my-deployment",
"name": "my-deployment",
"owner": "my-org"
}
],
"has_more": false
}

Usage:

  1. List your deployments via this endpoint
  2. Use the deployment name as the model identifier: replicate/my-org/my-deployment

The most important feature for Replicate integration is passing model-specific parameters. Any parameter that isn’t part of DeepIntShield’s standard schema is forwarded directly to the model:

{
"model": "replicate/stability-ai/sdxl",
"prompt": "A photo of an astronaut",
"temperature": 0.7,
"guidance_scale": 7.5,
"num_inference_steps": 50,
"scheduler": "DPMSolverMultistep"
}

Each Replicate model has unique parameters. To find available parameters:

  1. Model Page: Visit the model on replicate.com
  2. OpenAPI Schema: Available at /v1/models/{owner}/{name}/versions/{version_id} (includes openapi_schema)
  3. Cog Definition: Check the model’s source code (if public)

System Prompt Field Support

Severity: Medium Behavior: Not all models support a dedicated system prompt field. For unsupported models, the system prompt is prepended to the conversation prompt. Impact: Prompt structure differs between models Models Affected: meta/meta-llama-3-8b, meta/llama-2-70b, openai/gpt-oss-20b, openai/o1-mini, xai/grok-4, and all deepseek-ai/deepseek* models

Input Image Field Handling

Severity: Medium Behavior: Different models expect input images in different fields; the gateway routes your images to the correct field automatically. Impact: Supply images via input_images regardless of model Models Affected: Flux family models (see Input Images table)

Image Content in Chat

Severity: Low Behavior: Only image URLs from message content are passed through to the model Impact: Base64-encoded images in messages are ignored

Model-Specific Parameters

Severity: Medium Behavior: Each model has a unique input schema; standard parameters may not work for all models Impact: Requires checking model documentation for available parameters Mitigation: Pass model-specific fields directly in the request


Request Parameters

ParameterTypeRequiredNotes
modelstringReplicate model (owner/model or version ID)
promptstringText description of the video
input_referencestringReference image (base64 data URL or URL)
secondsstringDuration
seedintSeed for reproducibility
negative_promptstringWhat to avoid

Model-specific fields can be passed directly in the JSON body and are forwarded to the model. webhook and webhook_events_filter are handled automatically.

Response: id, status, model, videos[]

Job Statuses: queued (starting) → in_progress (processing) → completed / failed

OperationEndpointNotes
Get statusGET /v1/videos/{id}Returns job status
DownloadGET /v1/videos/{id}/contentDownloads the generated video