Reasoning
Overview
Section titled “Overview”Reasoning (also called “thinking” in some providers) allows AI models to show their step-by-step thought process before providing a final answer. This feature is available across multiple providers with different implementations.
Provider Support Matrix
Section titled “Provider Support Matrix”| Provider | Request Field | Response Field | Min Budget | Effort Levels | Streaming |
|---|---|---|---|---|---|
| OpenAI | reasoning | reasoning_details | None | minimal, low, medium, high | ✅ |
| Anthropic | thinking | Content blocks | 1024 tokens | enabled only | ✅ |
| Bedrock (Anthropic) | thinking | Content blocks | 1024 tokens | enabled only | ✅ |
| Gemini 2.5+ | thinking_config | thought parts | 1024 | Budget-only | ✅ |
| Gemini 3.0+ | thinking_config | thought parts | 1024 | minimal, low, medium, high + Budget | ✅ |
Request Configuration
Section titled “Request Configuration”Chat Completions API
Section titled “Chat Completions API”Add a reasoning object to your chat completions request body:
{ "model": "provider/model-name", "messages": [...], "reasoning": { "effort": "high", "max_tokens": 4096 }}from deepintshield import DeepintShield
shield = DeepintShield.from_env() # defaults to https://app.deepintshield.com
response = shield.chat( model="openai/o4-mini", messages=[{"role": "user", "content": "Explain quantum computing"}], reasoning={"effort": "high", "max_tokens": 4096},)curl --location 'https://app.deepintshield.com/v1/chat/completions' \--header 'Authorization: Bearer sk-bf-...' \--header 'Content-Type: application/json' \--data '{ "model": "openai/o4-mini", "messages": [{"role": "user", "content": "Explain quantum computing"}], "reasoning": {"effort": "high", "max_tokens": 4096}}'Responses API
Section titled “Responses API”The Responses API accepts the same reasoning object and adds an optional summary parameter:
{ "model": "provider/model-name", "input": [...], "reasoning": { "effort": "high", "max_tokens": 4096, "summary": "detailed" }}Parameter Reference
Section titled “Parameter Reference”Chat Completions API Parameters
Section titled “Chat Completions API Parameters”| Parameter | Type | Description |
|---|---|---|
effort | string | Reasoning intensity level |
max_tokens | int | Maximum tokens for reasoning (budget) |
Responses API Parameters
Section titled “Responses API Parameters”| Parameter | Type | Description |
|---|---|---|
effort | string | Reasoning intensity level |
max_tokens | int | Maximum tokens for reasoning (budget) |
summary | string | Summary level: brief, detailed, or json |
Provider-Specific Behavior
Section titled “Provider-Specific Behavior”OpenAI
Section titled “OpenAI”OpenAI uses effort-based reasoning. Supply reasoning.effort directly. If you supply only reasoning.max_tokens, the gateway derives an effort level for you.
Supported Effort Levels: minimal, low, medium, high
Anthropic
Section titled “Anthropic”Anthropic reasoning is budget-based. Set reasoning.max_tokens to the thinking budget; reasoning content is returned in the normalized reasoning_details array, with a signature field for verification.
Dynamic Budget Handling:
| Input Value | Behavior |
|---|---|
-1 (dynamic) | Uses the minimum budget of 1024 |
< 1024 | Error |
>= 1024 | Used as-is |
Bedrock (Anthropic Models)
Section titled “Bedrock (Anthropic Models)”Bedrock Claude models behave like Anthropic: set reasoning.max_tokens for the thinking budget.
Bedrock (Nova Models)
Section titled “Bedrock (Nova Models)”Bedrock Nova models use effort-based reasoning. Supply reasoning.effort.
| Effort | Notes |
|---|---|
minimal, low | Normal parameters allowed |
medium | Normal parameters allowed |
high | max_tokens, temperature, and top_p are not applied |
Notable differences from Anthropic on Bedrock:
- No minimum token budget constraint
- Uses effort levels instead of token budgets
- At
higheffort, conflicting sampling parameters are not sent
Gemini
Section titled “Gemini”Gemini supports both token budgets (reasoning.max_tokens) and effort levels (reasoning.effort), depending on the model version.
Model Version Support
Section titled “Model Version Support”| Gemini Version | Token Budget | Effort Level | Notes |
|---|---|---|---|
| 2.5+ | ✅ | ⚠️ (treated as a budget) | Budget-based models |
| 3.0+ | ✅ | ✅ | Support both budget and effort levels |
Effort levels on Pro models
Section titled “Effort levels on Pro models”Gemini Pro models support a narrower set of effort levels. When routed to a Pro model, the following adjustments are applied automatically:
| Effort | Non-Pro Models | Pro Models |
|---|---|---|
"none" | Disables thinking | Disables thinking |
"minimal" | minimal | low |
"low" | low | low |
"medium" | medium | high |
"high" | high | high |
Special Values
Section titled “Special Values”| Value | Field | Behavior |
|---|---|---|
0 | max_tokens | Disables reasoning |
-1 | max_tokens | Dynamic budget (Gemini decides) |
"none" | effort | Disables reasoning |
// Dynamic budget - let Gemini decide{ "reasoning": { "max_tokens": -1 } }
// Disable reasoning (either form works){ "reasoning": { "max_tokens": 0 } }{ "reasoning": { "effort": "none" } }Reasoning output is returned in the normalized reasoning_details array, the same as every other provider.
Two Reasoning Methods: Effort vs. Max Tokens
Section titled “Two Reasoning Methods: Effort vs. Max Tokens”Providers use one of two reasoning styles. You can use a single, consistent reasoning object regardless of which one the target provider expects.
| Style | Providers | Request Field |
|---|---|---|
| Effort-Based | OpenAI, AWS Bedrock Nova | reasoning.effort |
| Budget-Based | Anthropic, Cohere, Gemini | reasoning.max_tokens |
You can send effort and max_tokens together. The gateway uses whichever field is native to the target provider and translates the other for you, so you do not have to know each provider’s native format:
- Budget-based providers (Anthropic, Cohere, Gemini): if
max_tokensis present it is used; otherwise a budget is derived fromeffort. - Effort-based providers (OpenAI, Bedrock Nova): if
effortis present it is used; otherwise an effort level is derived frommax_tokens.
If neither field is present, reasoning is disabled.
Provider-Specific Constraints
Section titled “Provider-Specific Constraints”Different providers enforce different minimum reasoning budgets:
| Provider | Minimum Budget |
|---|---|
| Anthropic | 1024 |
| Bedrock Anthropic | 1024 |
| Bedrock Nova | 1 |
| Cohere | 1 |
| Gemini | 1024 |
Requests below a provider’s minimum budget are clamped up to that minimum, except where a hard error applies (see the Anthropic constraint above).
Request Examples
Section titled “Request Examples”You can always send the same reasoning object; the gateway applies it to the target provider for you.
Effort on a budget-based provider (Anthropic) - works even though Anthropic is budget-based:
{ "model": "anthropic/claude-3-5-sonnet", "messages": [{"role": "user", "content": "..."}], "reasoning": {"effort": "high"}}Budget on an effort-based provider (Bedrock Nova) - works even though Nova is effort-based:
{ "model": "bedrock/us.amazon.nova-pro-v1:0", "messages": [{"role": "user", "content": "..."}], "reasoning": {"max_tokens": 2000}}Both fields provided - the field native to the target provider wins. For Anthropic (budget-based), max_tokens is used and effort is ignored:
{ "model": "anthropic/claude-3-5-sonnet", "messages": [{"role": "user", "content": "..."}], "reasoning": {"effort": "medium", "max_tokens": 2500}}Response Format
Section titled “Response Format”DeepIntShield Standard Response
Section titled “DeepIntShield Standard Response”All providers return reasoning in a normalized reasoning_details array:
{ "choices": [{ "message": { "role": "assistant", "content": "Final response text", "reasoning_details": [ { "index": 0, "type": "text", "text": "Step-by-step reasoning content...", "signature": "optional_signature_for_verification" } ] } }]}Reasoning Details Fields
Section titled “Reasoning Details Fields”| Field | Type | Description | Present In |
|---|---|---|---|
index | int | Position in reasoning sequence | All |
type | string | Content type (text, encrypted, summary) | All |
text | string | Reasoning content | Chat Completions |
summary | string | Reasoning summary | Responses API |
signature | string | Cryptographic signature for verification | Anthropic, Bedrock |
Type Mappings
Section titled “Type Mappings”| Reasoning Type | When Used | Source |
|---|---|---|
reasoning.text | Direct thinking/reasoning content | Anthropic, Gemini, Bedrock |
reasoning.encrypted | Signature-verified reasoning | Anthropic, Bedrock Nova |
reasoning.summary | Summarized reasoning (Responses API) | All providers |
Streaming
Section titled “Streaming”Stream Event Types
Section titled “Stream Event Types”| Provider | Reasoning Event | Signature Event |
|---|---|---|
| OpenAI | reasoning (top-level) | N/A |
| Anthropic | thinking_delta | signature_delta |
| Bedrock | thinking_delta | signature_delta |
| Gemini | thought (in content) | thought_signature |
Anthropic Streaming Example
Section titled “Anthropic Streaming Example”// Stream eventsevent: content_block_startdata: {"type": "content_block_start", "content_block": {"type": "thinking"}}
event: content_block_deltadata: {"type": "content_block_delta", "delta": {"type": "thinking_delta", "thinking": "Let me"}}
event: content_block_deltadata: {"type": "content_block_delta", "delta": {"type": "thinking_delta", "thinking": " analyze..."}}
event: content_block_deltadata: {"type": "content_block_delta", "delta": {"type": "signature_delta", "signature": "EqoB..."}}
event: content_block_stopdata: {"type": "content_block_stop"}DeepIntShield Stream Response
Section titled “DeepIntShield Stream Response”// Thinking delta{ "choices": [{ "delta": { "reasoning_details": [{ "index": 0, "type": "text", "text": "Let me analyze..." }] } }]}
// Signature delta{ "choices": [{ "delta": { "reasoning_details": [{ "index": 0, "signature": "EqoB..." }] } }]}Caveats Summary
Section titled “Caveats Summary”Minimum Budget (Anthropic/Bedrock)
Severity: High
Behavior: reasoning.max_tokens must be >= 1024
Impact: Requests with lower values fail with error
Workaround: Always set max_tokens >= 1024 for Anthropic/Bedrock
Dynamic Budget Not Supported
Severity: Medium
Behavior: reasoning.max_tokens = -1 converted to 1024
Impact: Dynamic budgeting not available on Anthropic/Bedrock
Workaround: Set explicit token budget
Effort Level Normalization
Severity: Low
Behavior: OpenAI’s minimal converted to low when routing to other providers
Impact: Slightly different reasoning behavior
Signature Field Provider-Specific
Severity: Low
Behavior: signature field only present in Anthropic/Bedrock responses
Impact: Signature-based verification only available for these providers
Thinking Type Always Enabled
Severity: Low
Behavior: Anthropic’s thinking.type always set to "enabled" regardless of effort
Impact: Cannot disable thinking once reasoning param is present
Gemini: Only One Parameter Used
Severity: Medium
Behavior: When both effort and max_tokens are provided, max_tokens is used and effort is ignored
Impact: Effort value has no effect when max_tokens is present
Workaround: Provide only the parameter you want to use
Gemini: Model Version Differences
Severity: Medium Behavior: Gemini 2.5 is budget-based; 3.0+ supports both budgets and effort levels Impact: On Gemini 2.5, effort-only requests behave as a budget; on 3.0+ they use native effort levels
Gemini Pro: Limited Effort Levels
Severity: Low
Behavior: Pro models support only low and high effort levels
Impact: minimal behaves as low, and medium behaves as high on Pro models
Note: Non-Pro models support all four levels: minimal, low, medium, high
Complete Provider Comparison
Section titled “Complete Provider Comparison”Reasoning Model
Section titled “Reasoning Model”| Provider | Model Type | Budget Type | Min Budget | Signature Support |
|---|---|---|---|---|
| OpenAI | Effort-based | Effort-based | None | ❌ |
| Anthropic | Thinking blocks | Token budget | 1024 | ✅ |
| Bedrock (Anthropic) | Reasoning config | Token budget | 1024 | ✅ |
| Bedrock (Nova) | Reasoning config | Effort-based | None | ❌ |
| Gemini 2.5+ | Thinking config | Token budget | 1024 | ✅ |
| Gemini 3.0+ | Thinking config | Dual (budget + level) | 1024 | ✅ |
Parameter Support
Section titled “Parameter Support”| Provider | effort | max_tokens | summary | Streaming |
|---|---|---|---|---|
| OpenAI | ✅ (4 levels) | ✅ | ❌ | ✅ |
| Anthropic | ❌ (binary) | ✅ | ✅ | ✅ |
| Bedrock (Anthropic) | ❌ (binary) | ✅ | ✅ | ✅ |
| Bedrock (Nova) | ✅ (3 levels) | ⚠️ (ignored) | ❌ | ✅ |
| Gemini 2.5+ | ⚠️ (converts to budget) | ✅ | ❌ | ✅ |
| Gemini 3.0+ | ✅ (4 levels) | ✅ | ❌ | ✅ |
Troubleshooting
Section titled “Troubleshooting”Anthropic: “reasoning.max_tokens must be >= 1024”
Section titled “Anthropic: “reasoning.max_tokens must be >= 1024””Cause: Attempting to use reasoning with max_tokens < 1024
Solution: Ensure reasoning.max_tokens >= 1024 for Anthropic/Bedrock Anthropic models
// ❌ Invalid{"reasoning": {"effort": "high", "max_tokens": 500}}
// ✅ Valid{"reasoning": {"effort": "high", "max_tokens": 1024}}OpenAI: Model doesn’t support reasoning
Section titled “OpenAI: Model doesn’t support reasoning”Cause: Using an older model that doesn’t support reasoning (e.g., gpt-4-turbo)
Solution: Use OpenAI reasoning models: o4-mini, o3, o1, or the gpt-5 series. gpt-4o and gpt-4o-mini are not reasoning models and will reject the reasoning parameter.
Bedrock Nova: max_tokens parameter being ignored
Section titled “Bedrock Nova: max_tokens parameter being ignored”Expected Behavior: Bedrock Nova uses effort-based reasoning only
Solution: Provide effort parameter instead of max_tokens for Nova models
// ✅ Correct for Nova{"reasoning": {"effort": "high"}}