Anthropic

Overview

Call Anthropic (Claude) models through DeepIntShield using the same OpenAI-compatible Chat Completions and Responses APIs you use for every other provider. You send standard OpenAI-style requests and DeepIntShield handles Anthropic’s native format for you. A few Anthropic-specific behaviors are worth knowing:

Reasoning - the reasoning object drives Claude’s thinking, with a minimum token budget (see Reasoning / Thinking)
Cache control - add cache_control directives to enable Anthropic prompt caching (see Cache Control)
Anthropic-specific parameters - pass fields like top_k via extra_params

Supported Operations

Operation	Non-Streaming	Streaming	Endpoint
Chat Completions	✅	✅	`/v1/messages`
Responses API	✅	✅	`/v1/messages`
Text Completions	✅	❌	`/v1/complete`
Embeddings	❌	❌	-
Speech (TTS)	❌	❌	-
Transcriptions (STT)	❌	❌	-
Image Generation	❌	❌	-
Files	✅	-	`/v1/files`
Batch	✅	-	`/v1/messages/batches`
List Models	✅	-	`/v1/models`

1. Chat Completions

Request Parameters

Send standard OpenAI-compatible Chat Completions requests. temperature and top_p pass through directly. The following parameters are not supported by Anthropic and are ignored: frequency_penalty, presence_penalty, logit_bias, logprobs, top_logprobs, seed, parallel_tool_calls, service_tier.

Extra Parameters

Use extra_params (SDK) or pass directly in the request body (Gateway) for Anthropic-specific fields such as top_k:

curl -X POST https://app.deepintshield.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-bf-vk: $DEEPINTSHIELD_VIRTUAL_KEY" \
  -d '{
    "model": "anthropic/claude-3-5-sonnet",
    "messages": [{"role": "user", "content": "Hello"}],
    "top_k": 40
  }'

Anthropic also accepts a top-level "cache_control": {"type": "ephemeral"} object on requests to enable automatic prompt caching.

Cache Control

Cache directives can be added to system messages, user messages, and tool definitions to enable prompt caching:

curl -X POST https://app.deepintshield.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-bf-vk: $DEEPINTSHIELD_VIRTUAL_KEY" \
  -d '{
    "model": "anthropic/claude-3-5-sonnet",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "This is cached context",
            "cache_control": {"type": "ephemeral"}
          }
        ]
      }
    ],
    "system": [
      {
        "type": "text",
        "text": "You are a helpful assistant",
        "cache_control": {"type": "ephemeral"}
      }
    ]
  }'

Reasoning / Thinking

Documentation: See DeepIntShield Reasoning Reference

Use the reasoning object to enable Claude’s thinking:

reasoning.effort enables thinking
reasoning.max_tokens sets the token budget for thinking

Critical Constraints

Minimum budget: 1024 tokens required; requests below this fail with error
Dynamic budget: -1 is converted to 1024 automatically

{"reasoning": {"effort": "high", "max_tokens": 2048}}

Tool & Image Support

Tools: Standard OpenAI-style tool definitions are supported, including tool_choice values auto, none, required, and specific tool selection.
Images: Both URL images ({"type": "image_url", "image_url": {...}}) and base64 data-URL images are supported in message content.

Response

Responses come back in the standard OpenAI-compatible shape, so you read the same fields you use for other providers:

finish_reason (stop, length, tool_calls)
usage.prompt_tokens / usage.completion_tokens - token counts, with cache usage rolled into prompt_tokens
usage.prompt_tokens_details.cached_read_tokens / cached_write_tokens - cache read/write breakdown when prompt caching is used
reasoning_details - Claude thinking output (when reasoning is enabled)
Tool call arguments are returned as a JSON string in tool_calls

Streaming

Set "stream": true to receive incremental chunks. Output is delivered as standard OpenAI-compatible streaming deltas (content text, tool-call arguments, and reasoning text arrive progressively).

Caveats

Minimum Reasoning Budget

Behavior: reasoning.max_tokens must be >= 1024 Impact: Requests with lower values fail with error

Dynamic Budget Conversion

Behavior: reasoning.max_tokens = -1 is treated as 1024 Impact: Dynamic budgeting not supported

2. Responses API

The Responses API uses the same underlying /v1/messages endpoint. Send standard OpenAI Responses requests; DeepIntShield handles the Anthropic format.

Request Parameters

temperature and top_p pass through directly. Use extra_params (SDK) or pass directly in the request body (Gateway) for Anthropic-specific fields such as top_k, include, and stop:

curl -X POST https://app.deepintshield.com/v1/responses \
  -H "Content-Type: application/json" \
  -H "x-bf-vk: $DEEPINTSHIELD_VIRTUAL_KEY" \
  -d '{
    "model": "anthropic/claude-3-5-sonnet",
    "input": "Hello, how are you?",
    "top_k": 40
  }'

Cache Control

Cache directives can be added to instructions (system) and input messages to enable prompt caching:

curl -X POST https://app.deepintshield.com/v1/responses \
  -H "Content-Type: application/json" \
  -H "x-bf-vk: $DEEPINTSHIELD_VIRTUAL_KEY" \
  -d '{
    "model": "anthropic/claude-3-5-sonnet",
    "instructions": "You are a helpful assistant. This instruction is cached.",
    "instructions_cache_control": {"type": "ephemeral"},
    "input": [
      {
        "type": "text",
        "text": "Answer this question",
        "cache_control": {"type": "ephemeral"}
      }
    ]
  }'

Tool Support

Supported types: function, computer_use_preview, web_search, mcp. MCP tools accept server_label and server_url. Cache control is supported on instructions and input blocks (see Cache Control above).

Response

Responses come back in the standard OpenAI Responses shape:

status (completed, incomplete) reflects whether the model finished or hit the token limit
usage.input_tokens / usage.output_tokens, with cache usage broken out under input_tokens_details.cached_read_tokens and cached_write_tokens
output items: assistant text as message, tool calls as function_call, and Claude thinking as reasoning

Streaming

Set "stream": true to receive output as standard OpenAI Responses streaming events (text, tool-call arguments, and reasoning arrive incrementally).

3. Text Completions (Legacy)

Send a prompt with standard parameters. temperature and top_p pass through directly; top_k and stop can be set via extra_params. The response is returned in OpenAI-compatible completion shape.

4. Batch API

Request formats: requests array (CustomID + Params) or input_file_id

Pagination: Cursor-based with after_id, before_id, limit

Endpoints:

POST /v1/messages/batches - Create
GET /v1/messages/batches - List
GET /v1/messages/batches/{batch_id} - Retrieve
POST /v1/messages/batches/{batch_id}/cancel - Cancel

Response: JSONL format with {custom_id, result: {type, message}}

5. Files API

Upload: Multipart/form-data with file (required) and filename (optional)

Endpoints: POST /v1/files, GET /v1/files (cursor pagination), GET /v1/files/{file_id}, DELETE /v1/files/{file_id}, GET /v1/files/{file_id}/content

6. List Models

Request: GET /v1/models (no body)

Multi-key support: Results are aggregated from all keys, filtered by allowed_models if configured.