Reasoning

Overview

Reasoning (also called “thinking” in some providers) allows AI models to show their step-by-step thought process before providing a final answer. This feature is available across multiple providers with different implementations.

Provider Support Matrix

Provider	Request Field	Response Field	Min Budget	Effort Levels	Streaming
OpenAI	`reasoning`	`reasoning_details`	None	`minimal`, `low`, `medium`, `high`	✅
Anthropic	`thinking`	Content blocks	1024 tokens	`enabled` only	✅
Bedrock (Anthropic)	`thinking`	Content blocks	1024 tokens	`enabled` only	✅
Gemini 2.5+	`thinking_config`	`thought` parts	1024	Budget-only	✅
Gemini 3.0+	`thinking_config`	`thought` parts	1024	`minimal`, `low`, `medium`, `high` + Budget	✅

Request Configuration

Chat Completions API

Add a reasoning object to your chat completions request body:

{
  "model": "provider/model-name",
  "messages": [...],
  "reasoning": {
    "effort": "high",
    "max_tokens": 4096
  }
}

from deepintshield import DeepintShield

shield = DeepintShield.from_env()  # defaults to https://app.deepintshield.com

response = shield.chat(
    model="openai/o4-mini",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    reasoning={"effort": "high", "max_tokens": 4096},
)

curl --location 'https://app.deepintshield.com/v1/chat/completions' \
--header 'Authorization: Bearer sk-bf-...' \
--header 'Content-Type: application/json' \
--data '{
    "model": "openai/o4-mini",
    "messages": [{"role": "user", "content": "Explain quantum computing"}],
    "reasoning": {"effort": "high", "max_tokens": 4096}
}'

Responses API

The Responses API accepts the same reasoning object and adds an optional summary parameter:

{
  "model": "provider/model-name",
  "input": [...],
  "reasoning": {
    "effort": "high",
    "max_tokens": 4096,
    "summary": "detailed"
  }
}

Parameter Reference

Chat Completions API Parameters

Parameter	Type	Description
`effort`	`string`	Reasoning intensity level
`max_tokens`	`int`	Maximum tokens for reasoning (budget)

Responses API Parameters

Parameter	Type	Description
`effort`	`string`	Reasoning intensity level
`max_tokens`	`int`	Maximum tokens for reasoning (budget)
`summary`	`string`	Summary level: `brief`, `detailed`, or `json`

Provider-Specific Behavior

OpenAI

OpenAI uses effort-based reasoning. Supply reasoning.effort directly. If you supply only reasoning.max_tokens, the gateway derives an effort level for you.

Supported Effort Levels: minimal, low, medium, high

Anthropic

Anthropic reasoning is budget-based. Set reasoning.max_tokens to the thinking budget; reasoning content is returned in the normalized reasoning_details array, with a signature field for verification.

Dynamic Budget Handling:

Input Value	Behavior
`-1` (dynamic)	Uses the minimum budget of `1024`
`< 1024`	Error
`>= 1024`	Used as-is

Bedrock (Anthropic Models)

Bedrock Claude models behave like Anthropic: set reasoning.max_tokens for the thinking budget.

Bedrock (Nova Models)

Bedrock Nova models use effort-based reasoning. Supply reasoning.effort.

Effort	Notes
`minimal`, `low`	Normal parameters allowed
`medium`	Normal parameters allowed
`high`	`max_tokens`, `temperature`, and `top_p` are not applied

Notable differences from Anthropic on Bedrock:

No minimum token budget constraint
Uses effort levels instead of token budgets
At high effort, conflicting sampling parameters are not sent

Gemini

Gemini supports both token budgets (reasoning.max_tokens) and effort levels (reasoning.effort), depending on the model version.

Model Version Support

Gemini Version	Token Budget	Effort Level	Notes
2.5+	✅	⚠️ (treated as a budget)	Budget-based models
3.0+	✅	✅	Support both budget and effort levels

Effort levels on Pro models

Gemini Pro models support a narrower set of effort levels. When routed to a Pro model, the following adjustments are applied automatically:

Effort	Non-Pro Models	Pro Models
`"none"`	Disables thinking	Disables thinking
`"minimal"`	`minimal`	`low`
`"low"`	`low`	`low`
`"medium"`	`medium`	`high`
`"high"`	`high`	`high`

Special Values

Value	Field	Behavior
`0`	`max_tokens`	Disables reasoning
`-1`	`max_tokens`	Dynamic budget (Gemini decides)
`"none"`	`effort`	Disables reasoning

// Dynamic budget - let Gemini decide
{ "reasoning": { "max_tokens": -1 } }

// Disable reasoning (either form works)
{ "reasoning": { "max_tokens": 0 } }
{ "reasoning": { "effort": "none" } }

Reasoning output is returned in the normalized reasoning_details array, the same as every other provider.

Two Reasoning Methods: Effort vs. Max Tokens

Providers use one of two reasoning styles. You can use a single, consistent reasoning object regardless of which one the target provider expects.

Style	Providers	Request Field
Effort-Based	OpenAI, AWS Bedrock Nova	`reasoning.effort`
Budget-Based	Anthropic, Cohere, Gemini	`reasoning.max_tokens`

You can send effort and max_tokens together. The gateway uses whichever field is native to the target provider and translates the other for you, so you do not have to know each provider’s native format:

Budget-based providers (Anthropic, Cohere, Gemini): if max_tokens is present it is used; otherwise a budget is derived from effort.
Effort-based providers (OpenAI, Bedrock Nova): if effort is present it is used; otherwise an effort level is derived from max_tokens.

If neither field is present, reasoning is disabled.

Provider-Specific Constraints

Different providers enforce different minimum reasoning budgets:

Provider	Minimum Budget
Anthropic	1024
Bedrock Anthropic	1024
Bedrock Nova	1
Cohere	1
Gemini	1024

Requests below a provider’s minimum budget are clamped up to that minimum, except where a hard error applies (see the Anthropic constraint above).

Request Examples

You can always send the same reasoning object; the gateway applies it to the target provider for you.

Effort on a budget-based provider (Anthropic) - works even though Anthropic is budget-based:

{
  "model": "anthropic/claude-3-5-sonnet",
  "messages": [{"role": "user", "content": "..."}],
  "reasoning": {"effort": "high"}
}

Budget on an effort-based provider (Bedrock Nova) - works even though Nova is effort-based:

{
  "model": "bedrock/us.amazon.nova-pro-v1:0",
  "messages": [{"role": "user", "content": "..."}],
  "reasoning": {"max_tokens": 2000}
}

Both fields provided - the field native to the target provider wins. For Anthropic (budget-based), max_tokens is used and effort is ignored:

{
  "model": "anthropic/claude-3-5-sonnet",
  "messages": [{"role": "user", "content": "..."}],
  "reasoning": {"effort": "medium", "max_tokens": 2500}
}

Response Format

DeepIntShield Standard Response

All providers return reasoning in a normalized reasoning_details array:

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "Final response text",
      "reasoning_details": [
        {
          "index": 0,
          "type": "text",
          "text": "Step-by-step reasoning content...",
          "signature": "optional_signature_for_verification"
        }
      ]
    }
  }]
}

Reasoning Details Fields

Field	Type	Description	Present In
`index`	`int`	Position in reasoning sequence	All
`type`	`string`	Content type (`text`, `encrypted`, `summary`)	All
`text`	`string`	Reasoning content	Chat Completions
`summary`	`string`	Reasoning summary	Responses API
`signature`	`string`	Cryptographic signature for verification	Anthropic, Bedrock

Type Mappings

Reasoning Type	When Used	Source
`reasoning.text`	Direct thinking/reasoning content	Anthropic, Gemini, Bedrock
`reasoning.encrypted`	Signature-verified reasoning	Anthropic, Bedrock Nova
`reasoning.summary`	Summarized reasoning (Responses API)	All providers

Streaming

Stream Event Types

Provider	Reasoning Event	Signature Event
OpenAI	`reasoning` (top-level)	N/A
Anthropic	`thinking_delta`	`signature_delta`
Bedrock	`thinking_delta`	`signature_delta`
Gemini	`thought` (in content)	`thought_signature`

Anthropic Streaming Example

// Stream events
event: content_block_start
data: {"type": "content_block_start", "content_block": {"type": "thinking"}}

event: content_block_delta
data: {"type": "content_block_delta", "delta": {"type": "thinking_delta", "thinking": "Let me"}}

event: content_block_delta
data: {"type": "content_block_delta", "delta": {"type": "thinking_delta", "thinking": " analyze..."}}

event: content_block_delta
data: {"type": "content_block_delta", "delta": {"type": "signature_delta", "signature": "EqoB..."}}

event: content_block_stop
data: {"type": "content_block_stop"}

DeepIntShield Stream Response

// Thinking delta
{
  "choices": [{
    "delta": {
      "reasoning_details": [{
        "index": 0,
        "type": "text",
        "text": "Let me analyze..."
      }]
    }
  }]
}

// Signature delta
{
  "choices": [{
    "delta": {
      "reasoning_details": [{
        "index": 0,
        "signature": "EqoB..."
      }]
    }
  }]
}

Caveats Summary

Minimum Budget (Anthropic/Bedrock)

Severity: High Behavior: reasoning.max_tokens must be >= 1024 Impact: Requests with lower values fail with error Workaround: Always set max_tokens >= 1024 for Anthropic/Bedrock

Dynamic Budget Not Supported

Severity: Medium Behavior: reasoning.max_tokens = -1 converted to 1024 Impact: Dynamic budgeting not available on Anthropic/Bedrock Workaround: Set explicit token budget

Effort Level Normalization

Severity: Low Behavior: OpenAI’s minimal converted to low when routing to other providers Impact: Slightly different reasoning behavior

Signature Field Provider-Specific

Severity: Low Behavior: signature field only present in Anthropic/Bedrock responses Impact: Signature-based verification only available for these providers

Thinking Type Always Enabled

Severity: Low Behavior: Anthropic’s thinking.type always set to "enabled" regardless of effort Impact: Cannot disable thinking once reasoning param is present

Gemini: Only One Parameter Used

Severity: Medium Behavior: When both effort and max_tokens are provided, max_tokens is used and effort is ignored Impact: Effort value has no effect when max_tokens is present Workaround: Provide only the parameter you want to use

Gemini: Model Version Differences

Severity: Medium Behavior: Gemini 2.5 is budget-based; 3.0+ supports both budgets and effort levels Impact: On Gemini 2.5, effort-only requests behave as a budget; on 3.0+ they use native effort levels

Gemini Pro: Limited Effort Levels

Severity: Low Behavior: Pro models support only low and high effort levels Impact: minimal behaves as low, and medium behaves as high on Pro models Note: Non-Pro models support all four levels: minimal, low, medium, high

Complete Provider Comparison

Reasoning Model

Provider	Model Type	Budget Type	Min Budget	Signature Support
OpenAI	Effort-based	Effort-based	None	❌
Anthropic	Thinking blocks	Token budget	1024	✅
Bedrock (Anthropic)	Reasoning config	Token budget	1024	✅
Bedrock (Nova)	Reasoning config	Effort-based	None	❌
Gemini 2.5+	Thinking config	Token budget	1024	✅
Gemini 3.0+	Thinking config	Dual (budget + level)	1024	✅

Parameter Support

Provider	`effort`	`max_tokens`	`summary`	Streaming
OpenAI	✅ (4 levels)	✅	❌	✅
Anthropic	❌ (binary)	✅	✅	✅
Bedrock (Anthropic)	❌ (binary)	✅	✅	✅
Bedrock (Nova)	✅ (3 levels)	⚠️ (ignored)	❌	✅
Gemini 2.5+	⚠️ (converts to budget)	✅	❌	✅
Gemini 3.0+	✅ (4 levels)	✅	❌	✅

Troubleshooting

Anthropic: “reasoning.max_tokens must be >= 1024”

Cause: Attempting to use reasoning with max_tokens < 1024

Solution: Ensure reasoning.max_tokens >= 1024 for Anthropic/Bedrock Anthropic models

// ❌ Invalid
{"reasoning": {"effort": "high", "max_tokens": 500}}

// ✅ Valid
{"reasoning": {"effort": "high", "max_tokens": 1024}}

OpenAI: Model doesn’t support reasoning

Cause: Using an older model that doesn’t support reasoning (e.g., gpt-4-turbo)

Solution: Use OpenAI reasoning models: o4-mini, o3, o1, or the gpt-5 series. gpt-4o and gpt-4o-mini are not reasoning models and will reject the reasoning parameter.

Bedrock Nova: `max_tokens` parameter being ignored

Expected Behavior: Bedrock Nova uses effort-based reasoning only

Solution: Provide effort parameter instead of max_tokens for Nova models

// ✅ Correct for Nova
{"reasoning": {"effort": "high"}}