Skip to content

Anthropic

Call Anthropic (Claude) models through DeepIntShield using the same OpenAI-compatible Chat Completions and Responses APIs you use for every other provider. You send standard OpenAI-style requests and DeepIntShield handles Anthropic’s native format for you. A few Anthropic-specific behaviors are worth knowing:

  • Reasoning - the reasoning object drives Claude’s thinking, with a minimum token budget (see Reasoning / Thinking)
  • Cache control - add cache_control directives to enable Anthropic prompt caching (see Cache Control)
  • Anthropic-specific parameters - pass fields like top_k via extra_params
OperationNon-StreamingStreamingEndpoint
Chat Completions/v1/messages
Responses API/v1/messages
Text Completions/v1/complete
Embeddings-
Speech (TTS)-
Transcriptions (STT)-
Image Generation-
Files-/v1/files
Batch-/v1/messages/batches
List Models-/v1/models

Send standard OpenAI-compatible Chat Completions requests. temperature and top_p pass through directly. The following parameters are not supported by Anthropic and are ignored: frequency_penalty, presence_penalty, logit_bias, logprobs, top_logprobs, seed, parallel_tool_calls, service_tier.

Use extra_params (SDK) or pass directly in the request body (Gateway) for Anthropic-specific fields such as top_k:

Terminal window
curl -X POST https://app.deepintshield.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "x-bf-vk: $DEEPINTSHIELD_VIRTUAL_KEY" \
-d '{
"model": "anthropic/claude-3-5-sonnet",
"messages": [{"role": "user", "content": "Hello"}],
"top_k": 40
}'

Anthropic also accepts a top-level "cache_control": {"type": "ephemeral"} object on requests to enable automatic prompt caching.

Cache directives can be added to system messages, user messages, and tool definitions to enable prompt caching:

Terminal window
curl -X POST https://app.deepintshield.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "x-bf-vk: $DEEPINTSHIELD_VIRTUAL_KEY" \
-d '{
"model": "anthropic/claude-3-5-sonnet",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "This is cached context",
"cache_control": {"type": "ephemeral"}
}
]
}
],
"system": [
{
"type": "text",
"text": "You are a helpful assistant",
"cache_control": {"type": "ephemeral"}
}
]
}'

Documentation: See DeepIntShield Reasoning Reference

Use the reasoning object to enable Claude’s thinking:

  • reasoning.effort enables thinking
  • reasoning.max_tokens sets the token budget for thinking
  • Minimum budget: 1024 tokens required; requests below this fail with error
  • Dynamic budget: -1 is converted to 1024 automatically
{"reasoning": {"effort": "high", "max_tokens": 2048}}
  • Tools: Standard OpenAI-style tool definitions are supported, including tool_choice values auto, none, required, and specific tool selection.
  • Images: Both URL images ({"type": "image_url", "image_url": {...}}) and base64 data-URL images are supported in message content.

Responses come back in the standard OpenAI-compatible shape, so you read the same fields you use for other providers:

  • finish_reason (stop, length, tool_calls)
  • usage.prompt_tokens / usage.completion_tokens - token counts, with cache usage rolled into prompt_tokens
  • usage.prompt_tokens_details.cached_read_tokens / cached_write_tokens - cache read/write breakdown when prompt caching is used
  • reasoning_details - Claude thinking output (when reasoning is enabled)
  • Tool call arguments are returned as a JSON string in tool_calls

Set "stream": true to receive incremental chunks. Output is delivered as standard OpenAI-compatible streaming deltas (content text, tool-call arguments, and reasoning text arrive progressively).


Minimum Reasoning Budget

Behavior: reasoning.max_tokens must be >= 1024 Impact: Requests with lower values fail with error

Dynamic Budget Conversion

Behavior: reasoning.max_tokens = -1 is treated as 1024 Impact: Dynamic budgeting not supported


The Responses API uses the same underlying /v1/messages endpoint. Send standard OpenAI Responses requests; DeepIntShield handles the Anthropic format.

temperature and top_p pass through directly. Use extra_params (SDK) or pass directly in the request body (Gateway) for Anthropic-specific fields such as top_k, include, and stop:

Terminal window
curl -X POST https://app.deepintshield.com/v1/responses \
-H "Content-Type: application/json" \
-H "x-bf-vk: $DEEPINTSHIELD_VIRTUAL_KEY" \
-d '{
"model": "anthropic/claude-3-5-sonnet",
"input": "Hello, how are you?",
"top_k": 40
}'

Cache directives can be added to instructions (system) and input messages to enable prompt caching:

Terminal window
curl -X POST https://app.deepintshield.com/v1/responses \
-H "Content-Type: application/json" \
-H "x-bf-vk: $DEEPINTSHIELD_VIRTUAL_KEY" \
-d '{
"model": "anthropic/claude-3-5-sonnet",
"instructions": "You are a helpful assistant. This instruction is cached.",
"instructions_cache_control": {"type": "ephemeral"},
"input": [
{
"type": "text",
"text": "Answer this question",
"cache_control": {"type": "ephemeral"}
}
]
}'

Supported types: function, computer_use_preview, web_search, mcp. MCP tools accept server_label and server_url. Cache control is supported on instructions and input blocks (see Cache Control above).

Responses come back in the standard OpenAI Responses shape:

  • status (completed, incomplete) reflects whether the model finished or hit the token limit
  • usage.input_tokens / usage.output_tokens, with cache usage broken out under input_tokens_details.cached_read_tokens and cached_write_tokens
  • output items: assistant text as message, tool calls as function_call, and Claude thinking as reasoning

Set "stream": true to receive output as standard OpenAI Responses streaming events (text, tool-call arguments, and reasoning arrive incrementally).


Send a prompt with standard parameters. temperature and top_p pass through directly; top_k and stop can be set via extra_params. The response is returned in OpenAI-compatible completion shape.


Request formats: requests array (CustomID + Params) or input_file_id

Pagination: Cursor-based with after_id, before_id, limit

Endpoints:

  • POST /v1/messages/batches - Create
  • GET /v1/messages/batches - List
  • GET /v1/messages/batches/{batch_id} - Retrieve
  • POST /v1/messages/batches/{batch_id}/cancel - Cancel

Response: JSONL format with {custom_id, result: {type, message}}


Upload: Multipart/form-data with file (required) and filename (optional)

Endpoints: POST /v1/files, GET /v1/files (cursor pagination), GET /v1/files/{file_id}, DELETE /v1/files/{file_id}, GET /v1/files/{file_id}/content


Request: GET /v1/models (no body)

Multi-key support: Results are aggregated from all keys, filtered by allowed_models if configured.