Replicate

Overview

Replicate is a prediction-based platform where every request creates a “prediction” that runs asynchronously. Each model on Replicate defines its own input schema, making it highly flexible but requiring model-specific parameter knowledge. Through the DeepIntShield gateway you send standard OpenAI-style requests and receive standard responses; pass model-specific fields with extra_params.

Supported Operations

Operation	Non-Streaming	Streaming	Endpoint
Chat Completions	✅	✅	`/v1/predictions`
Responses API	✅	✅	`/v1/predictions`
Text Completions	✅	✅	`/v1/predictions`
Image Generation	✅	✅	`/v1/predictions`
Image Edit	✅	✅	`/v1/predictions`
Video Generation	✅	-	`/v1/predictions`
Image Variation	❌	❌	-
Files	✅	-	`/v1/files`
List Models	✅	-	`/v1/deployments`
Embeddings	❌	❌	-
Speech (TTS)	❌	❌	-
Transcriptions (STT)	❌	❌	-
Batch	❌	❌	-

Model Identification

Replicate models can be specified in three ways:

1. Version ID

curl -X POST https://app.deepintshield.com/v1/chat/completions \
  -H "Authorization: Bearer sk-bf-your-virtual-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "replicate/5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

2. Model Name

Format: owner/model-name

curl -X POST https://app.deepintshield.com/v1/chat/completions \
  -H "Authorization: Bearer sk-bf-your-virtual-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "replicate/meta/llama-2-7b-chat",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

3. Deployment

Configure deployed models in the Replicate key configuration. Deployments map custom model identifiers to actual deployment paths.

Configuration Example:

{
  "provider": "replicate",
  "value": "your-api-key",
  "replicate_key_config": {
    "deployments": {
      "my-model": "owner/my-deployment-name"
    }
  }
}

Usage:

curl -X POST https://app.deepintshield.com/v1/chat/completions \
  -H "Authorization: Bearer sk-bf-your-virtual-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "replicate/my-model",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Prediction Modes

Sync Mode

DeepIntShield uses sync mode when the Prefer: wait header is present in the request. The request blocks until the prediction completes or times out (default 60 seconds), then returns the result directly. If the timeout expires, the gateway falls back to polling.

Async Mode (Polling)

This is the default mode for Replicate predictions. DeepIntShield automatically polls the prediction until it completes, so you receive the final result in a single response.

Status Flow: starting → processing → succeeded/failed/canceled

1. Chat Completions

Send a standard chat request. System messages are supported, and image URLs in message content are passed through to the model.

System Prompt Filtering

Important: Not all Replicate models support a dedicated system prompt field. For unsupported models, the system prompt is automatically prepended to the conversation prompt.

Models without system prompt support:

meta/meta-llama-3-8b
meta/llama-2-70b
openai/gpt-oss-20b
openai/o1-mini
xai/grok-4
All deepseek-ai/deepseek* models (e.g., deepseek-r1, deepseek-v3)

Model-Specific Parameters

Pass model-specific parameters directly in the request body. Fields outside the standard schema are forwarded to the model:

curl -X POST https://app.deepintshield.com/v1/chat/completions \
  -H "Authorization: Bearer sk-bf-your-virtual-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "replicate/meta/llama-2-7b-chat",
    "messages": [{"role": "user", "content": "Hello"}],
    "temperature": 0.7,
    "top_k": 50,
    "repetition_penalty": 1.1,
    "min_new_tokens": 10
  }'

Example Response

{
  "id": "abc123",
  "model": "meta/llama-2-7b-chat",
  "object": "chat.completion",
  "created": 1234567890,
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 8,
    "total_tokens": 18
  }
}

Streaming

Set "stream": true to receive incremental content as Server-Sent Events, ending with a final chunk carrying finish_reason. A canceled or failed prediction is surfaced as a stream error.

2. Responses API

Replicate supports the OpenAI-style Responses API, with the same parameter handling and system-prompt behavior as Chat Completions. For OpenAI gpt-5-structured models, native Responses features (input_item_list, tools, json_schema) are available.

Response Format

Responses follow standard Responses API format with status mapping:

Replicate Status	Responses Status
`succeeded`	`completed`
`failed`	`failed`
`canceled`	`cancelled`
`processing`	`in_progress`
`starting`	`queued`

3. Text Completions (Legacy)

Send a prompt to the legacy completions endpoint. Pass model-specific fields such as top_k directly in the request body.

Example

curl -X POST https://app.deepintshield.com/v1/completions \
  -H "Authorization: Bearer sk-bf-your-virtual-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "replicate/meta/llama-2-7b",
    "prompt": "Once upon a time",
    "max_tokens": 100,
    "temperature": 0.8,
    "top_k": 40
  }'

Responses use the standard completions format, with text in choices[0].text and usage metrics in usage.

4. Image Generation

Supported Parameters

The following standard parameters are supported: prompt, n, aspect_ratio, resolution, output_format, quality, background, seed, negative_prompt, num_inference_steps, and input_images.

Input Images

Different Replicate models expect input images in different fields. DeepIntShield automatically sends your image(s) to the correct field based on the model, so you can always supply them via input_images:

Model family	Notes
`black-forest-labs/flux-1.1-pro`, `flux-1.1-pro-ultra`, `flux-pro`, `flux-1.1-pro-ultra-finetuned`	Single image
`black-forest-labs/flux-kontext-pro`, `flux-kontext-max`, `flux-kontext-dev`	Single image
`black-forest-labs/flux-dev`, `flux-fill-pro`, `flux-dev-lora`, `flux-krea-dev`	Single image
All other models	Multiple images

Example

curl -X POST https://app.deepintshield.com/v1/images/generations \
  -H "Authorization: Bearer sk-bf-your-virtual-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "replicate/black-forest-labs/flux-schnell",
    "prompt": "A serene mountain landscape at sunset",
    "aspect_ratio": "16:9",
    "output_format": "webp",
    "num_inference_steps": 4,
    "seed": 42
  }'

Response

Generated images are returned in data[] as URLs (data[].url) or, for some models, base64 data URIs.

{
  "id": "xyz789",
  "created": 1234567890,
  "model": "black-forest-labs/flux-schnell",
  "data": [
    {
      "url": "https://replicate.delivery/pbxt/...",
      "index": 0
    }
  ],
  "usage": {
    "input_tokens": 15,
    "output_tokens": 0,
    "total_tokens": 15
  }
}

Streaming

Image generation streaming provides progressive image updates. Each chunk carries a partial image as a data URI, and a final completion chunk signals the finished image.

5. Image Edit

Image edit runs as a prediction like image generation. You send one or more input images plus a prompt; the model returns edited image(s). The same input-image behavior as Image Generation applies - supply images via the request and the gateway routes them to the model’s expected field.

Endpoint: /v1/images/edits

Supported Parameters

Parameter	Notes
`image[]`	One or more input images
`prompt`	Edit instruction
`n`	Number of images
`output_format`	Output image format
`quality`	Output quality
`background`	Background handling
`seed`	Seed for reproducibility
`negative_prompt`	What to avoid
`num_inference_steps`	Inference steps

Model-specific fields can be passed directly and are forwarded to the model.

Example

curl -X POST 'https://app.deepintshield.com/v1/images/edits' \
--header 'Authorization: Bearer sk-bf-your-virtual-key' \
--form 'model="replicate/black-forest-labs/flux-fill-pro"' \
--form 'image[]=@"image.png"' \
--form 'prompt="Replace the sky with a starry night"' \
--form 'mask=@"mask.png"'

Response

Same as Image Generation: edited images are returned in data[] as URLs (data[].url) or base64 data URIs (data[].b64_json).

Streaming

Image edit streaming is supported. Partial chunks (type: "image_edit.partial_image") stream until a final type: "image_edit.completed" chunk with the finished image and usage. Use Prefer: wait for sync behavior or rely on polling (async) like other Replicate predictions.

6. Files API

Replicate’s Files API supports uploading, listing, and managing files for use in predictions.

Upload

Request: Multipart form-data

Field	Type	Required	Notes
`file`	binary	✅	File content
`filename`	string	❌	Custom filename
`content_type`	string	❌	MIME type (auto-detected from extension)

Example:

curl -X POST https://app.deepintshield.com/v1/files \
  -H "Authorization: Bearer sk-bf-your-virtual-key" \
  -F "file=@document.pdf" \
  -F "filename=my-document.pdf"

Response:

{
  "id": "file_abc123",
  "object": "file",
  "bytes": 12345,
  "created_at": 1234567890,
  "filename": "my-document.pdf",
  "purpose": "batch",
  "status": "processed"
}

List Files

Query Parameters:

Parameter	Type	Notes
`limit`	int	Results per page
`after`	string	Pagination cursor

Example:

curl -X GET "https://app.deepintshield.com/v1/files?limit=20" \
  -H "Authorization: Bearer sk-bf-your-virtual-key"

Pagination is cursor-based; use the after cursor to fetch the next page.

Retrieve / Delete

Operations:

GET /v1/files/{file_id} - Retrieve file metadata
DELETE /v1/files/{file_id} - Delete file

File Content Download

Required Parameters:

Parameter	Type	Description
`owner`	string	File owner username
`expiry`	int64	Unix timestamp for expiration
`signature`	string	Base64-encoded HMAC-SHA256 signature

Signature Format: HMAC-SHA256 of "{owner} {file_id} {expiry}" using the Files API signing secret.

Example:

curl -X POST https://app.deepintshield.com/v1/files/file_abc123/content \
  -H "Authorization: Bearer sk-bf-your-virtual-key" \
  -H "Content-Type: application/json" \
  -d '{
    "owner": "my-username",
    "expiry": 1735689600,
    "signature": "base64-encoded-signature"
  }'

7. List Models

Endpoint: /v1/models

Deployments are private or organization models with dedicated infrastructure. The response includes:

{
  "data": [
    {
      "id": "replicate/my-org/my-deployment",
      "name": "my-deployment",
      "owner": "my-org"
    }
  ],
  "has_more": false
}

Usage:

List your deployments via this endpoint
Use the deployment name as the model identifier: replicate/my-org/my-deployment

Extra Parameters

Model-Specific Parameters

The most important feature for Replicate integration is passing model-specific parameters. Any parameter that isn’t part of DeepIntShield’s standard schema is forwarded directly to the model:

{
  "model": "replicate/stability-ai/sdxl",
  "prompt": "A photo of an astronaut",
  "temperature": 0.7,
  "guidance_scale": 7.5,
  "num_inference_steps": 50,
  "scheduler": "DPMSolverMultistep"
}

Discovering Model Parameters

Each Replicate model has unique parameters. To find available parameters:

Model Page: Visit the model on replicate.com
OpenAPI Schema: Available at /v1/models/{owner}/{name}/versions/{version_id} (includes openapi_schema)
Cog Definition: Check the model’s source code (if public)

Caveats

System Prompt Field Support

Severity: Medium Behavior: Not all models support a dedicated system prompt field. For unsupported models, the system prompt is prepended to the conversation prompt. Impact: Prompt structure differs between models Models Affected: meta/meta-llama-3-8b, meta/llama-2-70b, openai/gpt-oss-20b, openai/o1-mini, xai/grok-4, and all deepseek-ai/deepseek* models

Input Image Field Handling

Severity: Medium Behavior: Different models expect input images in different fields; the gateway routes your images to the correct field automatically. Impact: Supply images via input_images regardless of model Models Affected: Flux family models (see Input Images table)

Image Content in Chat

Severity: Low Behavior: Only image URLs from message content are passed through to the model Impact: Base64-encoded images in messages are ignored

Model-Specific Parameters

Severity: Medium Behavior: Each model has a unique input schema; standard parameters may not work for all models Impact: Requires checking model documentation for available parameters Mitigation: Pass model-specific fields directly in the request

Video Generation

Generate (`POST /v1/videos`)

Request Parameters

Parameter	Type	Required	Notes
`model`	string	✅	Replicate model (owner/model or version ID)
`prompt`	string	✅	Text description of the video
`input_reference`	string	❌	Reference image (base64 data URL or URL)
`seconds`	string	❌	Duration
`seed`	int	❌	Seed for reproducibility
`negative_prompt`	string	❌	What to avoid

Model-specific fields can be passed directly in the JSON body and are forwarded to the model. webhook and webhook_events_filter are handled automatically.

Response: id, status, model, videos[]

Job Statuses: queued (starting) → in_progress (processing) → completed / failed

Retrieve / Download

Operation	Endpoint	Notes
Get status	`GET /v1/videos/{id}`	Returns job status
Download	`GET /v1/videos/{id}/content`	Downloads the generated video

Replicate

Overview

Supported Operations

Model Identification

1. Version ID

2. Model Name

3. Deployment

Prediction Modes

Sync Mode

Async Mode (Polling)

1. Chat Completions

System Prompt Filtering

Model-Specific Parameters

Example Response

Streaming

2. Responses API

Response Format

3. Text Completions (Legacy)

Example

4. Image Generation

Supported Parameters

Input Images

Example

Response

Streaming

5. Image Edit

Supported Parameters

Example

Response

Streaming

6. Files API

Upload

List Files

Retrieve / Delete

File Content Download

7. List Models

Extra Parameters

Model-Specific Parameters

Discovering Model Parameters

Caveats

Video Generation

Generate (POST /v1/videos)

Retrieve / Download

Reference Links

Generate (`POST /v1/videos`)