Replicate
Overview
Section titled “Overview”Replicate is a prediction-based platform where every request creates a “prediction” that runs asynchronously. Each model on Replicate defines its own input schema, making it highly flexible but requiring model-specific parameter knowledge. Through the DeepIntShield gateway you send standard OpenAI-style requests and receive standard responses; pass model-specific fields with extra_params.
Supported Operations
Section titled “Supported Operations”| Operation | Non-Streaming | Streaming | Endpoint |
|---|---|---|---|
| Chat Completions | ✅ | ✅ | /v1/predictions |
| Responses API | ✅ | ✅ | /v1/predictions |
| Text Completions | ✅ | ✅ | /v1/predictions |
| Image Generation | ✅ | ✅ | /v1/predictions |
| Image Edit | ✅ | ✅ | /v1/predictions |
| Video Generation | ✅ | - | /v1/predictions |
| Image Variation | ❌ | ❌ | - |
| Files | ✅ | - | /v1/files |
| List Models | ✅ | - | /v1/deployments |
| Embeddings | ❌ | ❌ | - |
| Speech (TTS) | ❌ | ❌ | - |
| Transcriptions (STT) | ❌ | ❌ | - |
| Batch | ❌ | ❌ | - |
Model Identification
Section titled “Model Identification”Replicate models can be specified in three ways:
1. Version ID
Section titled “1. Version ID”curl -X POST https://app.deepintshield.com/v1/chat/completions \ -H "Authorization: Bearer sk-bf-your-virtual-key" \ -H "Content-Type: application/json" \ -d '{ "model": "replicate/5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa", "messages": [{"role": "user", "content": "Hello"}] }'2. Model Name
Section titled “2. Model Name”Format: owner/model-name
curl -X POST https://app.deepintshield.com/v1/chat/completions \ -H "Authorization: Bearer sk-bf-your-virtual-key" \ -H "Content-Type: application/json" \ -d '{ "model": "replicate/meta/llama-2-7b-chat", "messages": [{"role": "user", "content": "Hello"}] }'3. Deployment
Section titled “3. Deployment”Configure deployed models in the Replicate key configuration. Deployments map custom model identifiers to actual deployment paths.
Configuration Example:
{ "provider": "replicate", "value": "your-api-key", "replicate_key_config": { "deployments": { "my-model": "owner/my-deployment-name" } }}Usage:
curl -X POST https://app.deepintshield.com/v1/chat/completions \ -H "Authorization: Bearer sk-bf-your-virtual-key" \ -H "Content-Type: application/json" \ -d '{ "model": "replicate/my-model", "messages": [{"role": "user", "content": "Hello"}] }'Prediction Modes
Section titled “Prediction Modes”Sync Mode
Section titled “Sync Mode”DeepIntShield uses sync mode when the Prefer: wait header is present in the request. The request blocks until the prediction completes or times out (default 60 seconds), then returns the result directly. If the timeout expires, the gateway falls back to polling.
Async Mode (Polling)
Section titled “Async Mode (Polling)”This is the default mode for Replicate predictions. DeepIntShield automatically polls the prediction until it completes, so you receive the final result in a single response.
Status Flow: starting → processing → succeeded/failed/canceled
1. Chat Completions
Section titled “1. Chat Completions”Send a standard chat request. System messages are supported, and image URLs in message content are passed through to the model.
System Prompt Filtering
Section titled “System Prompt Filtering”Important: Not all Replicate models support a dedicated system prompt field. For unsupported models, the system prompt is automatically prepended to the conversation prompt.
Models without system prompt support:
meta/meta-llama-3-8bmeta/llama-2-70bopenai/gpt-oss-20bopenai/o1-minixai/grok-4- All
deepseek-ai/deepseek*models (e.g.,deepseek-r1,deepseek-v3)
Model-Specific Parameters
Section titled “Model-Specific Parameters”Pass model-specific parameters directly in the request body. Fields outside the standard schema are forwarded to the model:
curl -X POST https://app.deepintshield.com/v1/chat/completions \ -H "Authorization: Bearer sk-bf-your-virtual-key" \ -H "Content-Type: application/json" \ -d '{ "model": "replicate/meta/llama-2-7b-chat", "messages": [{"role": "user", "content": "Hello"}], "temperature": 0.7, "top_k": 50, "repetition_penalty": 1.1, "min_new_tokens": 10 }'Example Response
Section titled “Example Response”{ "id": "abc123", "model": "meta/llama-2-7b-chat", "object": "chat.completion", "created": 1234567890, "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! How can I help you?" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 10, "completion_tokens": 8, "total_tokens": 18 }}Streaming
Section titled “Streaming”Set "stream": true to receive incremental content as Server-Sent Events, ending with a final chunk carrying finish_reason. A canceled or failed prediction is surfaced as a stream error.
2. Responses API
Section titled “2. Responses API”Replicate supports the OpenAI-style Responses API, with the same parameter handling and system-prompt behavior as Chat Completions. For OpenAI gpt-5-structured models, native Responses features (input_item_list, tools, json_schema) are available.
Response Format
Section titled “Response Format”Responses follow standard Responses API format with status mapping:
| Replicate Status | Responses Status |
|---|---|
succeeded | completed |
failed | failed |
canceled | cancelled |
processing | in_progress |
starting | queued |
3. Text Completions (Legacy)
Section titled “3. Text Completions (Legacy)”Send a prompt to the legacy completions endpoint. Pass model-specific fields such as top_k directly in the request body.
Example
Section titled “Example”curl -X POST https://app.deepintshield.com/v1/completions \ -H "Authorization: Bearer sk-bf-your-virtual-key" \ -H "Content-Type: application/json" \ -d '{ "model": "replicate/meta/llama-2-7b", "prompt": "Once upon a time", "max_tokens": 100, "temperature": 0.8, "top_k": 40 }'Responses use the standard completions format, with text in choices[0].text and usage metrics in usage.
4. Image Generation
Section titled “4. Image Generation”Supported Parameters
Section titled “Supported Parameters”The following standard parameters are supported: prompt, n, aspect_ratio, resolution, output_format, quality, background, seed, negative_prompt, num_inference_steps, and input_images.
Input Images
Section titled “Input Images”Different Replicate models expect input images in different fields. DeepIntShield automatically sends your image(s) to the correct field based on the model, so you can always supply them via input_images:
| Model family | Notes |
|---|---|
black-forest-labs/flux-1.1-pro, flux-1.1-pro-ultra, flux-pro, flux-1.1-pro-ultra-finetuned | Single image |
black-forest-labs/flux-kontext-pro, flux-kontext-max, flux-kontext-dev | Single image |
black-forest-labs/flux-dev, flux-fill-pro, flux-dev-lora, flux-krea-dev | Single image |
| All other models | Multiple images |
Example
Section titled “Example”curl -X POST https://app.deepintshield.com/v1/images/generations \ -H "Authorization: Bearer sk-bf-your-virtual-key" \ -H "Content-Type: application/json" \ -d '{ "model": "replicate/black-forest-labs/flux-schnell", "prompt": "A serene mountain landscape at sunset", "aspect_ratio": "16:9", "output_format": "webp", "num_inference_steps": 4, "seed": 42 }'Response
Section titled “Response”Generated images are returned in data[] as URLs (data[].url) or, for some models, base64 data URIs.
{ "id": "xyz789", "created": 1234567890, "model": "black-forest-labs/flux-schnell", "data": [ { "url": "https://replicate.delivery/pbxt/...", "index": 0 } ], "usage": { "input_tokens": 15, "output_tokens": 0, "total_tokens": 15 }}Streaming
Section titled “Streaming”Image generation streaming provides progressive image updates. Each chunk carries a partial image as a data URI, and a final completion chunk signals the finished image.
5. Image Edit
Section titled “5. Image Edit”Image edit runs as a prediction like image generation. You send one or more input images plus a prompt; the model returns edited image(s). The same input-image behavior as Image Generation applies - supply images via the request and the gateway routes them to the model’s expected field.
Endpoint: /v1/images/edits
Supported Parameters
Section titled “Supported Parameters”| Parameter | Notes |
|---|---|
image[] | One or more input images |
prompt | Edit instruction |
n | Number of images |
output_format | Output image format |
quality | Output quality |
background | Background handling |
seed | Seed for reproducibility |
negative_prompt | What to avoid |
num_inference_steps | Inference steps |
Model-specific fields can be passed directly and are forwarded to the model.
Example
Section titled “Example”curl -X POST 'https://app.deepintshield.com/v1/images/edits' \--header 'Authorization: Bearer sk-bf-your-virtual-key' \--form 'model="replicate/black-forest-labs/flux-fill-pro"' \--form 'image[]=@"image.png"' \--form 'prompt="Replace the sky with a starry night"' \--form 'mask=@"mask.png"'Response
Section titled “Response”Same as Image Generation: edited images are returned in data[] as URLs (data[].url) or base64 data URIs (data[].b64_json).
Streaming
Section titled “Streaming”Image edit streaming is supported. Partial chunks (type: "image_edit.partial_image") stream until a final type: "image_edit.completed" chunk with the finished image and usage. Use Prefer: wait for sync behavior or rely on polling (async) like other Replicate predictions.
6. Files API
Section titled “6. Files API”Replicate’s Files API supports uploading, listing, and managing files for use in predictions.
Upload
Section titled “Upload”Request: Multipart form-data
| Field | Type | Required | Notes |
|---|---|---|---|
file | binary | ✅ | File content |
filename | string | ❌ | Custom filename |
content_type | string | ❌ | MIME type (auto-detected from extension) |
Example:
curl -X POST https://app.deepintshield.com/v1/files \ -H "Authorization: Bearer sk-bf-your-virtual-key" \ -F "file=@document.pdf" \ -F "filename=my-document.pdf"Response:
{ "id": "file_abc123", "object": "file", "bytes": 12345, "created_at": 1234567890, "filename": "my-document.pdf", "purpose": "batch", "status": "processed"}List Files
Section titled “List Files”Query Parameters:
| Parameter | Type | Notes |
|---|---|---|
limit | int | Results per page |
after | string | Pagination cursor |
Example:
curl -X GET "https://app.deepintshield.com/v1/files?limit=20" \ -H "Authorization: Bearer sk-bf-your-virtual-key"Pagination is cursor-based; use the after cursor to fetch the next page.
Retrieve / Delete
Section titled “Retrieve / Delete”Operations:
- GET
/v1/files/{file_id}- Retrieve file metadata - DELETE
/v1/files/{file_id}- Delete file
File Content Download
Section titled “File Content Download”Required Parameters:
| Parameter | Type | Description |
|---|---|---|
owner | string | File owner username |
expiry | int64 | Unix timestamp for expiration |
signature | string | Base64-encoded HMAC-SHA256 signature |
Signature Format: HMAC-SHA256 of "{owner} {file_id} {expiry}" using the Files API signing secret.
Example:
curl -X POST https://app.deepintshield.com/v1/files/file_abc123/content \ -H "Authorization: Bearer sk-bf-your-virtual-key" \ -H "Content-Type: application/json" \ -d '{ "owner": "my-username", "expiry": 1735689600, "signature": "base64-encoded-signature" }'7. List Models
Section titled “7. List Models”Endpoint: /v1/models
Deployments are private or organization models with dedicated infrastructure. The response includes:
{ "data": [ { "id": "replicate/my-org/my-deployment", "name": "my-deployment", "owner": "my-org" } ], "has_more": false}Usage:
- List your deployments via this endpoint
- Use the deployment name as the model identifier:
replicate/my-org/my-deployment
Extra Parameters
Section titled “Extra Parameters”Model-Specific Parameters
Section titled “Model-Specific Parameters”The most important feature for Replicate integration is passing model-specific parameters. Any parameter that isn’t part of DeepIntShield’s standard schema is forwarded directly to the model:
{ "model": "replicate/stability-ai/sdxl", "prompt": "A photo of an astronaut", "temperature": 0.7, "guidance_scale": 7.5, "num_inference_steps": 50, "scheduler": "DPMSolverMultistep"}Discovering Model Parameters
Section titled “Discovering Model Parameters”Each Replicate model has unique parameters. To find available parameters:
- Model Page: Visit the model on replicate.com
- OpenAPI Schema: Available at
/v1/models/{owner}/{name}/versions/{version_id}(includesopenapi_schema) - Cog Definition: Check the model’s source code (if public)
Caveats
Section titled “Caveats”System Prompt Field Support
Severity: Medium
Behavior: Not all models support a dedicated system prompt field. For unsupported models, the system prompt is prepended to the conversation prompt.
Impact: Prompt structure differs between models
Models Affected: meta/meta-llama-3-8b, meta/llama-2-70b, openai/gpt-oss-20b, openai/o1-mini, xai/grok-4, and all deepseek-ai/deepseek* models
Input Image Field Handling
Severity: Medium
Behavior: Different models expect input images in different fields; the gateway routes your images to the correct field automatically.
Impact: Supply images via input_images regardless of model
Models Affected: Flux family models (see Input Images table)
Image Content in Chat
Severity: Low Behavior: Only image URLs from message content are passed through to the model Impact: Base64-encoded images in messages are ignored
Model-Specific Parameters
Severity: Medium Behavior: Each model has a unique input schema; standard parameters may not work for all models Impact: Requires checking model documentation for available parameters Mitigation: Pass model-specific fields directly in the request
Video Generation
Section titled “Video Generation”Generate (POST /v1/videos)
Section titled “Generate (POST /v1/videos)”Request Parameters
| Parameter | Type | Required | Notes |
|---|---|---|---|
model | string | ✅ | Replicate model (owner/model or version ID) |
prompt | string | ✅ | Text description of the video |
input_reference | string | ❌ | Reference image (base64 data URL or URL) |
seconds | string | ❌ | Duration |
seed | int | ❌ | Seed for reproducibility |
negative_prompt | string | ❌ | What to avoid |
Model-specific fields can be passed directly in the JSON body and are forwarded to the model. webhook and webhook_events_filter are handled automatically.
Response: id, status, model, videos[]
Job Statuses: queued (starting) → in_progress (processing) → completed / failed
Retrieve / Download
Section titled “Retrieve / Download”| Operation | Endpoint | Notes |
|---|---|---|
| Get status | GET /v1/videos/{id} | Returns job status |
| Download | GET /v1/videos/{id}/content | Downloads the generated video |