Fallbacks

Automatic Provider Failover

Fallbacks provide automatic failover when your primary AI provider experiences issues. Whether it’s rate limiting, outages, or model unavailability, DeepIntShield automatically tries backup providers in the order you specify until one succeeds.

When a fallback is triggered, DeepIntShield treats it as a completely new request - all configured features (caching, governance, logging, etc.) apply again for the fallback provider, ensuring consistent behavior across all providers.

What You Get

List backup provider/models on a request and DeepIntShield tries them in the order you specify when the primary fails (network error, rate limit, model unavailable, timeout). It returns the first successful response, or the original primary error if every option fails. Each attempt is a fresh request, so caching, governance, and logging all apply to whichever provider ultimately handles it. You can always tell which provider served the response via extra_fields.provider.

# Chat completion with multiple fallbacks
curl -X POST https://app.deepintshield.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-bf-vk: sk-bf-your-virtual-key" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "Explain quantum computing in simple terms"
      }
    ],
    "fallbacks": [
      "anthropic/claude-3-5-sonnet-20241022",
      "bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0"
    ],
    "max_tokens": 1000,
    "temperature": 0.7
  }'

Response (from whichever provider succeeded):

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing is like having a super-powered calculator..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 150,
    "total_tokens": 162
  },
  "extra_fields": {
    "provider": "anthropic",
    "latency": 1.2
  }
}

The fallback chain is an extra_body field passed straight through to the gateway.

from deepintshield import DeepintShield

shield = DeepintShield.from_env()
openai = shield.openai()

response = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "Explain quantum computing in simple terms"},
    ],
    max_tokens=1000,
    temperature=0.7,
    extra_body={
        "fallbacks": [
            "anthropic/claude-3-5-sonnet-20241022",
            "bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0",
        ],
    },
)

print(response.choices[0].message.content)

Real-World Scenarios

Scenario 1: Rate Limiting

Primary: OpenAI hits rate limit → Fallback: Anthropic succeeds
Your application continues without interruption

Scenario 2: Model Unavailability

Primary: Specific model unavailable → Fallback: Different provider with similar model
Seamless transition to equivalent capability

Scenario 3: Provider Outage

Primary: Provider experiencing downtime → Fallback: Alternative provider
Business continuity maintained

Scenario 4: Cost Optimization

Primary: Premium model for quality → Fallback: Cost-effective alternative if budget exceeded
Governance rules can trigger fallbacks based on usage

Fallback Behavior Details

What Triggers Fallbacks:

Network connectivity issues
Provider API errors (500, 502, 503, 504)
Rate limiting (429 errors)
Model unavailability
Request timeouts
Authentication failures

What Preserves Original Error:

Request validation errors (malformed requests)
Plugin-enforced blocks (governance violations)
Certain provider-specific errors marked as non-retryable

Consistent Behavior Across Providers: When a fallback is triggered, the fallback request is treated as completely new:

Semantic cache checks apply again (a different provider might have a cached response)
Governance rules apply to the new provider
Logging captures the fallback attempt
All configured features apply to the fallback provider

Governance-Controlled Fallbacks: Your governance and guardrail policies can decide whether a fallback should be attempted. For example, a compliance-sensitive workload can be configured so that certain errors return immediately without trying a backup provider.

This ensures consistent behavior regardless of which provider ultimately handles your request. And you can always tell which provider served the response via extra_fields.