Skip to content

Reasoning

Reasoning (also called “thinking” in some providers) allows AI models to show their step-by-step thought process before providing a final answer. This feature is available across multiple providers with different implementations.


ProviderRequest FieldResponse FieldMin BudgetEffort LevelsStreaming
OpenAIreasoningreasoning_detailsNoneminimal, low, medium, high
AnthropicthinkingContent blocks1024 tokensenabled only
Bedrock (Anthropic)thinkingContent blocks1024 tokensenabled only
Gemini 2.5+thinking_configthought parts1024Budget-only
Gemini 3.0+thinking_configthought parts1024minimal, low, medium, high + Budget

Add a reasoning object to your chat completions request body:

{
"model": "provider/model-name",
"messages": [...],
"reasoning": {
"effort": "high",
"max_tokens": 4096
}
}

The Responses API accepts the same reasoning object and adds an optional summary parameter:

{
"model": "provider/model-name",
"input": [...],
"reasoning": {
"effort": "high",
"max_tokens": 4096,
"summary": "detailed"
}
}
ParameterTypeDescription
effortstringReasoning intensity level
max_tokensintMaximum tokens for reasoning (budget)
ParameterTypeDescription
effortstringReasoning intensity level
max_tokensintMaximum tokens for reasoning (budget)
summarystringSummary level: brief, detailed, or json

OpenAI uses effort-based reasoning. Supply reasoning.effort directly. If you supply only reasoning.max_tokens, the gateway derives an effort level for you.

Supported Effort Levels: minimal, low, medium, high


Anthropic reasoning is budget-based. Set reasoning.max_tokens to the thinking budget; reasoning content is returned in the normalized reasoning_details array, with a signature field for verification.

Dynamic Budget Handling:

Input ValueBehavior
-1 (dynamic)Uses the minimum budget of 1024
< 1024Error
>= 1024Used as-is

Bedrock Claude models behave like Anthropic: set reasoning.max_tokens for the thinking budget.


Bedrock Nova models use effort-based reasoning. Supply reasoning.effort.

EffortNotes
minimal, lowNormal parameters allowed
mediumNormal parameters allowed
highmax_tokens, temperature, and top_p are not applied

Notable differences from Anthropic on Bedrock:

  • No minimum token budget constraint
  • Uses effort levels instead of token budgets
  • At high effort, conflicting sampling parameters are not sent

Gemini supports both token budgets (reasoning.max_tokens) and effort levels (reasoning.effort), depending on the model version.

Gemini VersionToken BudgetEffort LevelNotes
2.5+⚠️ (treated as a budget)Budget-based models
3.0+Support both budget and effort levels

Gemini Pro models support a narrower set of effort levels. When routed to a Pro model, the following adjustments are applied automatically:

EffortNon-Pro ModelsPro Models
"none"Disables thinkingDisables thinking
"minimal"minimallow
"low"lowlow
"medium"mediumhigh
"high"highhigh
ValueFieldBehavior
0max_tokensDisables reasoning
-1max_tokensDynamic budget (Gemini decides)
"none"effortDisables reasoning
// Dynamic budget - let Gemini decide
{ "reasoning": { "max_tokens": -1 } }
// Disable reasoning (either form works)
{ "reasoning": { "max_tokens": 0 } }
{ "reasoning": { "effort": "none" } }

Reasoning output is returned in the normalized reasoning_details array, the same as every other provider.


Two Reasoning Methods: Effort vs. Max Tokens

Section titled “Two Reasoning Methods: Effort vs. Max Tokens”

Providers use one of two reasoning styles. You can use a single, consistent reasoning object regardless of which one the target provider expects.

StyleProvidersRequest Field
Effort-BasedOpenAI, AWS Bedrock Novareasoning.effort
Budget-BasedAnthropic, Cohere, Geminireasoning.max_tokens

You can send effort and max_tokens together. The gateway uses whichever field is native to the target provider and translates the other for you, so you do not have to know each provider’s native format:

  • Budget-based providers (Anthropic, Cohere, Gemini): if max_tokens is present it is used; otherwise a budget is derived from effort.
  • Effort-based providers (OpenAI, Bedrock Nova): if effort is present it is used; otherwise an effort level is derived from max_tokens.

If neither field is present, reasoning is disabled.


Different providers enforce different minimum reasoning budgets:

ProviderMinimum Budget
Anthropic1024
Bedrock Anthropic1024
Bedrock Nova1
Cohere1
Gemini1024

Requests below a provider’s minimum budget are clamped up to that minimum, except where a hard error applies (see the Anthropic constraint above).


You can always send the same reasoning object; the gateway applies it to the target provider for you.

Effort on a budget-based provider (Anthropic) - works even though Anthropic is budget-based:

{
"model": "anthropic/claude-3-5-sonnet",
"messages": [{"role": "user", "content": "..."}],
"reasoning": {"effort": "high"}
}

Budget on an effort-based provider (Bedrock Nova) - works even though Nova is effort-based:

{
"model": "bedrock/us.amazon.nova-pro-v1:0",
"messages": [{"role": "user", "content": "..."}],
"reasoning": {"max_tokens": 2000}
}

Both fields provided - the field native to the target provider wins. For Anthropic (budget-based), max_tokens is used and effort is ignored:

{
"model": "anthropic/claude-3-5-sonnet",
"messages": [{"role": "user", "content": "..."}],
"reasoning": {"effort": "medium", "max_tokens": 2500}
}

All providers return reasoning in a normalized reasoning_details array:

{
"choices": [{
"message": {
"role": "assistant",
"content": "Final response text",
"reasoning_details": [
{
"index": 0,
"type": "text",
"text": "Step-by-step reasoning content...",
"signature": "optional_signature_for_verification"
}
]
}
}]
}
FieldTypeDescriptionPresent In
indexintPosition in reasoning sequenceAll
typestringContent type (text, encrypted, summary)All
textstringReasoning contentChat Completions
summarystringReasoning summaryResponses API
signaturestringCryptographic signature for verificationAnthropic, Bedrock
Reasoning TypeWhen UsedSource
reasoning.textDirect thinking/reasoning contentAnthropic, Gemini, Bedrock
reasoning.encryptedSignature-verified reasoningAnthropic, Bedrock Nova
reasoning.summarySummarized reasoning (Responses API)All providers

ProviderReasoning EventSignature Event
OpenAIreasoning (top-level)N/A
Anthropicthinking_deltasignature_delta
Bedrockthinking_deltasignature_delta
Geminithought (in content)thought_signature
// Stream events
event: content_block_start
data: {"type": "content_block_start", "content_block": {"type": "thinking"}}
event: content_block_delta
data: {"type": "content_block_delta", "delta": {"type": "thinking_delta", "thinking": "Let me"}}
event: content_block_delta
data: {"type": "content_block_delta", "delta": {"type": "thinking_delta", "thinking": " analyze..."}}
event: content_block_delta
data: {"type": "content_block_delta", "delta": {"type": "signature_delta", "signature": "EqoB..."}}
event: content_block_stop
data: {"type": "content_block_stop"}
// Thinking delta
{
"choices": [{
"delta": {
"reasoning_details": [{
"index": 0,
"type": "text",
"text": "Let me analyze..."
}]
}
}]
}
// Signature delta
{
"choices": [{
"delta": {
"reasoning_details": [{
"index": 0,
"signature": "EqoB..."
}]
}
}]
}

Minimum Budget (Anthropic/Bedrock)

Severity: High Behavior: reasoning.max_tokens must be >= 1024 Impact: Requests with lower values fail with error Workaround: Always set max_tokens >= 1024 for Anthropic/Bedrock

Dynamic Budget Not Supported

Severity: Medium Behavior: reasoning.max_tokens = -1 converted to 1024 Impact: Dynamic budgeting not available on Anthropic/Bedrock Workaround: Set explicit token budget

Effort Level Normalization

Severity: Low Behavior: OpenAI’s minimal converted to low when routing to other providers Impact: Slightly different reasoning behavior

Signature Field Provider-Specific

Severity: Low Behavior: signature field only present in Anthropic/Bedrock responses Impact: Signature-based verification only available for these providers

Thinking Type Always Enabled

Severity: Low Behavior: Anthropic’s thinking.type always set to "enabled" regardless of effort Impact: Cannot disable thinking once reasoning param is present

Gemini: Only One Parameter Used

Severity: Medium Behavior: When both effort and max_tokens are provided, max_tokens is used and effort is ignored Impact: Effort value has no effect when max_tokens is present Workaround: Provide only the parameter you want to use

Gemini: Model Version Differences

Severity: Medium Behavior: Gemini 2.5 is budget-based; 3.0+ supports both budgets and effort levels Impact: On Gemini 2.5, effort-only requests behave as a budget; on 3.0+ they use native effort levels

Gemini Pro: Limited Effort Levels

Severity: Low Behavior: Pro models support only low and high effort levels Impact: minimal behaves as low, and medium behaves as high on Pro models Note: Non-Pro models support all four levels: minimal, low, medium, high


ProviderModel TypeBudget TypeMin BudgetSignature Support
OpenAIEffort-basedEffort-basedNone
AnthropicThinking blocksToken budget1024
Bedrock (Anthropic)Reasoning configToken budget1024
Bedrock (Nova)Reasoning configEffort-basedNone
Gemini 2.5+Thinking configToken budget1024
Gemini 3.0+Thinking configDual (budget + level)1024
Providereffortmax_tokenssummaryStreaming
OpenAI✅ (4 levels)
Anthropic❌ (binary)
Bedrock (Anthropic)❌ (binary)
Bedrock (Nova)✅ (3 levels)⚠️ (ignored)
Gemini 2.5+⚠️ (converts to budget)
Gemini 3.0+✅ (4 levels)

Anthropic: “reasoning.max_tokens must be >= 1024”

Section titled “Anthropic: “reasoning.max_tokens must be >= 1024””

Cause: Attempting to use reasoning with max_tokens < 1024

Solution: Ensure reasoning.max_tokens >= 1024 for Anthropic/Bedrock Anthropic models

// ❌ Invalid
{"reasoning": {"effort": "high", "max_tokens": 500}}
// ✅ Valid
{"reasoning": {"effort": "high", "max_tokens": 1024}}

Cause: Using an older model that doesn’t support reasoning (e.g., gpt-4-turbo)

Solution: Use OpenAI reasoning models: o4-mini, o3, o1, or the gpt-5 series. gpt-4o and gpt-4o-mini are not reasoning models and will reject the reasoning parameter.

Bedrock Nova: max_tokens parameter being ignored

Section titled “Bedrock Nova: max_tokens parameter being ignored”

Expected Behavior: Bedrock Nova uses effort-based reasoning only

Solution: Provide effort parameter instead of max_tokens for Nova models

// ✅ Correct for Nova
{"reasoning": {"effort": "high"}}