Provider Routing

Overview

DeepIntShield offers two powerful methods for routing requests across AI providers, each serving different use cases:

Governance-based Routing: Explicit, user-defined routing rules configured via Virtual Keys
Adaptive Load Balancing: Automatic, performance-based routing powered by real-time metrics (Team plan and above)

When both methods are available, governance takes precedence because users have explicitly defined their routing preferences through provider configurations on Virtual Keys.

The Model Catalog

The Model Catalog is the registry of which models are available from which providers. It lets you route by model name (for example, gpt-4o) without manually listing every model for every provider, and it keeps per-model pricing current for cost and budget calculations.

What it does for you: when you request a plain model name like claude-3-5-sonnet, the catalog automatically resolves every configured provider that can serve it - including proxy and compatibility providers - so a single request can route to Anthropic, Vertex, Bedrock, or OpenRouter depending on which providers you have configured. You do not have to list each provider-specific variant:

Requested model	Also resolves through
`claude-3-5-sonnet`	Anthropic, Vertex, Bedrock, OpenRouter
`gpt-4o`	OpenAI, Azure
`gpt-3.5-turbo`	OpenAI, Groq

The catalog refreshes automatically (pricing/model data is re-synced periodically, and a provider’s model list is re-fetched whenever you add or update that provider), so newly released models become usable without manual maintenance. You can also trigger a manual refresh from the dashboard: open the provider on the Providers page and use the model-list refresh control to re-fetch its available models.

Allowed Models Behavior with Examples

The allowed_models field in provider configs controls which models can be used with that provider. Understanding its behavior is crucial for governance routing.

Configuration:

{
  "provider_configs": [
    {
      "provider": "openai",
      "allowed_models": [],  // Empty = defer to catalog
      "weight": 1.0
    }
  ]
}

Behavior:

DeepIntShield looks up the models the catalog knows OpenAI can serve
The requested model is validated against that catalog list

Examples:

# ✅ Allowed (in catalog)
curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-4o"}'

# ✅ Allowed (in catalog)
curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-3.5-turbo"}'

# ❌ Rejected (not in OpenAI catalog)
curl -H "x-bf-vk: vk-123" -d '{"model": "claude-3-5-sonnet"}'

Use Cases:

Default behavior for most deployments
Automatically stays up-to-date with provider’s model offerings
No manual model list maintenance required

Configuration:

{
  "provider_configs": [
    {
      "provider": "openai",
      "allowed_models": ["gpt-4o", "gpt-4o-mini"],  // Only these two
      "weight": 1.0
    },
    {
      "provider": "anthropic",
      "allowed_models": ["claude-3-5-sonnet-20241022"],  // Specific version
      "weight": 1.0
    }
  ]
}

Behavior:

DeepIntShield validates request model against explicit list
Catalog is ignored for this provider
Supports both direct matches and provider-prefixed entries
Case-sensitive matching

Examples:

# ✅ Allowed (in explicit list)
curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-4o"}'

# ❌ Rejected (not in explicit list)
curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-4-turbo"}'
# Even though gpt-4-turbo is in the OpenAI catalog!

# ✅ Allowed (exact match for Anthropic)
curl -H "x-bf-vk: vk-123" -d '{"model": "claude-3-5-sonnet-20241022"}'

# ❌ Rejected (version mismatch)
curl -H "x-bf-vk: vk-123" -d '{"model": "claude-3-5-sonnet-20240620"}'

Provider-Prefixed Entries:

You can also use provider-prefixed model names in allowed_models. A provider-prefixed entry like openai/gpt-4o matches a request for the unprefixed model gpt-4o:

{
  "provider_configs": [
    {
      "provider": "openrouter",
      "allowed_models": ["openai/gpt-4o", "anthropic/claude-3-5-sonnet"],
      "weight": 1.0
    }
  ]
}

# A request for "gpt-4o" matches the allowed entry "openai/gpt-4o"
# and is routed to OpenRouter
curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-4o"}'

This is particularly useful for proxy providers (OpenRouter, Vertex) where you want to explicitly control which upstream models are accessible.

Use Cases:

Compliance requirements (only approved models)
Cost control (restrict to cheaper models)
Version pinning (prevent automatic updates)
Testing specific model versions
Explicit cross-provider routing (e.g., only allow OpenAI models via OpenRouter)

Key Concept: Deployments are key-specific mappings that allow user-friendly model names to map to provider-specific deployment identifiers.

How Deployments Work:

Defined at the Key level, not Virtual Key level
Structure: deployments: {"alias": "deployment-id"}
Alias (left side): User-facing model name used in requests
Deployment ID (right side): Provider-specific identifier sent to the API

Azure OpenAI Example:

Provider configuration with deployment mapping:

{
  "providers": {
    "azure": {
      "keys": [
        {
          "name": "azure-prod-key",
          "value": "your-api-key",
          "models": [],  // Not used when deployments exist
          "azure_key_config": {
            "endpoint": "https://your-resource.openai.azure.com",
            "deployments": {
              "gpt-4o": "my-prod-gpt4o-deployment",
              "gpt-4o-mini": "my-mini-deployment"
            }
          }
        }
      ]
    }
  }
}

With this config, you request the friendly alias ({"model": "gpt-4o"}) and DeepIntShield sends the mapped deployment name (my-prod-gpt4o-deployment) to Azure. The aliases (gpt-4o, gpt-4o-mini) become the allowed model names for that key.

Bedrock Example with Inference Profiles:

{
  "providers": {
    "bedrock": {
      "keys": [
        {
          "name": "bedrock-key",
          "models": [],
          "bedrock_key_config": {
            "access_key": "your-access-key",
            "secret_key": "your-secret-key",
            "region": "us-east-1",
            "deployments": {
              "claude-sonnet": "us.anthropic.claude-3-5-sonnet-20241022-v2:0",
              "claude-opus": "us.anthropic.claude-3-opus-20240229-v1:0"
            }
          }
        }
      ]
    }
  }
}

Here you request the short alias ({"model": "claude-sonnet"}) and DeepIntShield sends the full inference profile (us.anthropic.claude-3-5-sonnet-20241022-v2:0) to Bedrock. The aliases (claude-sonnet, claude-opus) are the allowed model names.

Priority of Model Restrictions:

The allowed models for a key are determined in this order:

1. If key.models is NOT empty → Use key.models
2. Else if deployments exist → Use deployment aliases (map keys)
3. Else → All models allowed (use Model Catalog)

Example with Both:

{
  "keys": [
    {
      "models": ["gpt-4o", "gpt-3.5-turbo"],  // Explicit restriction
      "azure_key_config": {
        "deployments": {
          "gpt-4o": "my-deployment",
          "gpt-4-turbo": "another-deployment"  // NOT accessible!
        }
      }
    }
  ]
}

Result: Only ["gpt-4o", "gpt-3.5-turbo"] allowed (models field takes priority)

Vertex Example (similar pattern):

{
  "keys": [
    {
      "vertex_key_config": {
        "project_id": "my-project",
        "region": "us-central1",
        "deployments": {
          "claude-3-5-sonnet": "anthropic/claude-3-5-sonnet@20241022",
          "gemini-pro": "google/gemini-2.5-pro"
        }
      }
    }
  ]
}

Use Cases for Deployments:

Azure: Map generic model names to specific deployment names in your Azure resource
Bedrock: Use short aliases for long inference profile ARNs
Vertex: Map to specific model versions or regional endpoints
Multi-environment: Different deployments per key (dev/staging/prod)

This allows user-friendly model names in requests while supporting provider-specific deployment patterns at the key level - you request the alias, and the mapped deployment ID is what reaches the provider.

Configuration:

{
  "provider_configs": [
    {
      "provider": "openai",
      "allowed_models": ["gpt-4o"],
      "weight": 0.5
    },
    {
      "provider": "azure",
      "allowed_models": ["gpt-4o"],
      "weight": 0.5
    }
  ]
}

Request:

curl -H "x-bf-vk: vk-123" \
     -d '{"model": "gpt-4o"}'

Routing Behavior: Both providers allow gpt-4o, so the request is split 50/50 between them by weight. If the chosen provider fails, the remaining one is used as a fallback.

Special Cross-Provider Scenarios:

OpenRouter as Universal Proxy

{
  "provider_configs": [
    {
      "provider": "openrouter",
      "allowed_models": []  // Use catalog
    }
  ]
}

Request claude-3-5-sonnet:

DeepIntShield checks the models OpenRouter can serve
Finds: anthropic/claude-3-5-sonnet in the OpenRouter catalog
✅ Allowed, routes to OpenRouter

Weighted Routing via Proxy Provider

Use Case: Route 99% of OpenAI traffic through OpenRouter for cost savings, keep 1% direct for fallback

{
  "provider_configs": [
    {
      "provider": "openai",
      "allowed_models": ["gpt-4o"],
      "weight": 0.01  // 1% direct to OpenAI
    },
    {
      "provider": "openrouter",
      "allowed_models": ["openai/gpt-4o"],  // Provider-prefixed
      "weight": 0.99  // 99% via OpenRouter
    }
  ]
}

A request for gpt-4o is allowed on both providers (the provider-prefixed entry openai/gpt-4o matches the unprefixed request). By weight, ~99% of traffic goes to OpenRouter and ~1% goes directly to OpenAI, which also serves as the fallback.

Vertex as Multi-Provider Gateway

{
  "provider_configs": [
    {
      "provider": "vertex",
      "allowed_models": ["claude-3-5-sonnet", "gemini-2.5-pro"]
    }
  ]
}

A request for claude-3-5-sonnet is allowed (it’s in allowed_models and the catalog knows Vertex can serve it) and is routed to Vertex.

Groq OpenAI Compatibility

{
  "provider_configs": [
    {
      "provider": "groq",
      "allowed_models": ["gpt-3.5-turbo"]
    }
  ]
}

A request for gpt-3.5-turbo is allowed (Groq exposes it through its OpenAI-compatible catalog) and is routed to Groq.

Governance-based Routing

Governance-based routing lets you explicitly define which providers and models should handle requests for a specific Virtual Key. Attach one or more provider_configs to a Virtual Key, and requests using that key are distributed across those providers by the weights you set. Providers that are over budget, rate-limited, or that don’t allow the requested model are automatically skipped, and the remaining providers (highest weight first) become fallbacks.

Configuration Example

{
  "provider_configs": [
    {
      "provider": "openai",
      "allowed_models": ["gpt-4o", "gpt-4o-mini"],
      "weight": 0.3,
      "budget": {
        "max_limit": 100.0,
        "current_usage": 45.0
      }
    },
    {
      "provider": "azure",
      "allowed_models": ["gpt-4o"],
      "weight": 0.7,
      "rate_limit": {
        "token_max_limit": 100000,
        "token_reset_duration": "1m"
      }
    }
  ]
}

Request Flow

Send a request with your Virtual Key and a plain model name:

curl -X POST https://app.deepintshield.com/v1/chat/completions \
  -H "x-bf-vk: vk-prod-main" \
  -d '{"model": "gpt-4o", "messages": []}'

With the example config above, both OpenAI (weight 0.3) and Azure (weight 0.7) allow gpt-4o and are within their budget/rate limits, so the request is split by weight - roughly 70% to Azure and 30% to OpenAI. If the selected provider fails, the other is used as a fallback.

Key Features

Feature	Description
Explicit Control	Define exactly which providers and models are accessible
Budget Enforcement	Automatically exclude providers exceeding budget limits
Rate Limit Protection	Skip providers that have hit rate limits
Weighted Distribution	Control traffic distribution with custom weights
Automatic Fallbacks	Failed providers automatically retry with next highest weight

Best Practices

Cost Optimization

Assign higher weights to cheaper providers for cost-sensitive workloads:

{
  "provider_configs": [
    {"provider": "groq", "weight": 0.7},
    {"provider": "openai", "weight": 0.3}
  ]
}

Environment Separation

Create different Virtual Keys for dev/staging/prod with different provider access:

{
  "virtual_keys": [
    {
      "id": "vk-dev",
      "provider_configs": [{"provider": "ollama"}]
    },
    {
      "id": "vk-prod",
      "provider_configs": [{"provider": "openai"}, {"provider": "azure"}]
    }
  ]
}

Compliance & Data Residency

Restrict specific Virtual Keys to compliant providers:

{
  "provider_configs": [
    {"provider": "azure", "allowed_models": ["gpt-4o"]},
    {"provider": "bedrock", "allowed_models": ["claude-3-5-sonnet"]}
  ]
}

Adaptive Load Balancing

Adaptive Load Balancing automatically routes each request to the best-performing provider and API key based on live metrics (error rates, latency, rate-limit pressure), so you get failover and optimal routing without tuning weights by hand. It works at two levels: it picks the best provider for the model (when you haven’t pinned one), and - even when the provider is already fixed by governance or by you - it still picks the best-performing API key within that provider.

To use it, enable adaptive load balancing for your deployment and send requests as usual. Send a plain model name (for example gpt-4o) to let it choose the provider, or a prefixed name (openai/gpt-4o) to fix the provider and let it optimize only the key:

# Let adaptive routing pick the provider and key
curl -X POST https://app.deepintshield.com/v1/chat/completions \
  -H "Authorization: Bearer sk-bf-your-virtual-key" \
  -d '{"model": "gpt-4o", "messages": [...]}'

# Fix the provider; adaptive routing still picks the best key
curl -X POST https://app.deepintshield.com/v1/chat/completions \
  -H "Authorization: Bearer sk-bf-your-virtual-key" \
  -d '{"model": "openai/gpt-4o", "messages": [...]}'

What You Get

Behavior	What it means for you
Automatic Optimization	No manual weight tuning required
Real-time Adaptation	Routing adjusts continuously as live latency and error rates change
Automatic Failover	Failing providers and keys are removed from rotation, then re-introduced once they recover
Per-key Optimization	Picks the best API key within a provider, even when the provider is fixed

Dashboard Visibility

Monitor load balancing performance from the dashboard:

The dashboard shows:

Traffic distribution across provider-model-key routes
Performance metrics (error rates, latency, success rates)
Route health, so you can see which providers and keys are degraded or recovering
Actual vs expected traffic distribution

How Governance and Load Balancing Interact

Governance and adaptive load balancing complement each other. The practical rule is:

The provider is chosen by governance when the Virtual Key has provider_configs (or by you when you send a prefixed model like openai/gpt-4o). Otherwise adaptive load balancing picks the best-performing provider for the model.
The API key within the chosen provider is always optimized by adaptive load balancing (when enabled) - even when you or governance fixed the provider.

Example Scenarios

Setup:

Virtual Key has provider_configs defined
No adaptive load balancing enabled

Request:

curl -X POST https://app.deepintshield.com/v1/chat/completions \
  -H "x-bf-vk: vk-prod-main" \
  -d '{"model": "gpt-4o", "messages": [...]}'

Behavior:

Governance applies weighted provider routing and selects Azure (70% weight); the model becomes azure/gpt-4o.
An Azure key is chosen by static weights and the request is forwarded to Azure.

Setup:

No Virtual Key, or a Virtual Key without provider_configs
Adaptive load balancing enabled

Request:

curl -X POST https://app.deepintshield.com/v1/chat/completions \
  -H "Authorization: Bearer sk-bf-your-virtual-key" \
  -d '{"model": "gpt-4o", "messages": [...]}'

Behavior:

Adaptive routing picks the best-performing provider for gpt-4o (say OpenAI), and the best-performing OpenAI key within it. The request is forwarded to OpenAI with that key.

Setup:

Virtual Key has provider_configs defined
Adaptive load balancing enabled
Azure has 3 keys configured

Request:

curl -X POST https://app.deepintshield.com/v1/chat/completions \
  -H "x-bf-vk: vk-prod-main" \
  -d '{"model": "gpt-4o", "messages": [...]}'

Behavior:

Governance selects the provider (Azure), respecting your explicit config; the model becomes azure/gpt-4o.
Adaptive routing then selects the best-performing Azure key - skipping any key that is rate-limited or repeatedly failing.

Why? Governance controls the provider (your explicit intent), while load balancing still optimizes the key for you.

Setup:

Both governance and load balancing enabled
OpenAI has multiple keys available

Request:

curl -X POST https://app.deepintshield.com/v1/chat/completions \
  -H "Authorization: Bearer sk-bf-your-virtual-key" \
  -d '{"model": "openai/gpt-4o", "messages": [...]}'

Behavior:

You fixed the provider by prefixing the model (openai/gpt-4o), so provider selection is skipped.
Adaptive routing still selects the best-performing OpenAI key, then forwards the request.

Why? Even when you specify the provider, key-level optimization keeps picking the best-performing OpenAI key.

Provider vs Key Selection Rules

Scenario	Who picks the provider	Who picks the key
VK with `provider_configs`	Governance (weighted random)	Static weights, or adaptive if enabled
VK without `provider_configs`, adaptive enabled	Adaptive load balancing	Adaptive load balancing
Model sent with provider prefix	You (already specified)	Adaptive load balancing
No adaptive load balancing	Governance, your prefix, or the catalog	Static weights

Routing Rules (Dynamic Expression-Based Routing)

Routing Rules give you dynamic, expression-based control over routing. You write a CEL expression (for example, headers["x-tier"] == "premium") and a target provider/model; when the expression matches a request, the rule overrides whatever governance or default routing would have chosen. Rules are highest-priority: a matching rule wins over a Virtual Key’s provider_configs. If no rule matches, governance and load balancing decide as usual.

Available CEL Variables

Routing rules access request context through CEL variables:

// Request context
model                      // Requested model
provider                   // Current provider

// Headers and parameters (case-insensitive)
headers["x-tier"]          // Request header
params["region"]           // Query parameter

// Organization context
virtual_key_id             // VirtualKey ID
team_name                  // Team name
customer_id                // Customer ID

// Capacity metrics (0-100 percentage)
budget_used                // Budget usage %
tokens_used                // Token rate limit usage %
request                    // Request rate limit usage %

Examples

Route based on user tier

headers["x-tier"] == "premium"   // → openai/gpt-4o

Route to fallback when budget high

budget_used > 85                 // → groq/llama-2 (cheaper)

Route by team

team_name == "ml-research"       // → anthropic/claude-3-opus

Complex multi-condition routing

headers["x-environment"] == "production" &&
tokens_used < 75 &&
team_name == "ai-platform"       // → openai/gpt-4o

Scope Hierarchy

Rules are evaluated in organizational precedence order (first-match-wins):

1. VirtualKey scope (highest priority)
2. Team scope
3. Customer scope
4. Global scope (lowest priority)

Within each scope, rules are sorted by priority (ascending: 0 before 10).

Key Features

Feature	Description
CEL Expressions	Powerful, composable condition language with multiple operators
Scope Hierarchy	Rules at VirtualKey/Team/Customer/Global levels with proper precedence
Dynamic Override	Override provider and/or model based on runtime conditions
Fallback Chains	Define multiple fallback providers for automatic failover
Priority Ordering	Lower priority evaluated first within same scope
Capacity Awareness	Access real-time budget and rate limit usage percentages

How Routing Rules Combine With Governance and Load Balancing

A matching rule takes precedence over governance. If a rule matches (for example budget_used > 85 routing to a cheaper provider), its target provider/model wins and the Virtual Key’s provider_configs are bypassed.
No match falls back to governance. When no rule matches, governance provider selection runs as normal.
Key optimization still applies. Whichever way the provider was chosen, adaptive load balancing still selects the best-performing API key within that provider.

Use Cases

Tier-based routing: Premium users → fast providers
Capacity failover: High budget usage → cheaper providers
Team preferences: Different teams → different providers
A/B testing: Route subset of traffic to test models
Regional routing: EU users → EU providers (data residency)
Complex logic: Combine multiple conditions for sophisticated routing

Dashboard

Routing rules are configured in the dashboard:

Visual rule builder with a CEL expression editor
Scope: Create rules at global, customer, team, or virtual key levels
Priority: Order rules within scope with numeric priority

For complete documentation, see Routing Rules Documentation.

Choosing the Right Approach

Use Governance When:

✅ Compliance requirements: Need to ensure data stays in specific regions or providers ✅ Cost optimization: Want explicit control over traffic distribution to cheaper providers ✅ Budget enforcement: Need hard limits on spending per provider ✅ Environment separation: Different teams/apps need different provider access ✅ Rate limit management: Need to respect provider-specific rate limits
Use Routing Rules When:

✅ Dynamic routing: Route based on runtime request context (headers, parameters) ✅ Capacity-aware routing: Switch to fallback when budget/rate limits high ✅ Organization-based routing: Different rules for teams/customers ✅ A/B testing: Route subset of traffic to test new models ✅ Complex conditions: Multiple criteria (e.g., tier + capacity + team)
Use Load Balancing When:

✅ Performance optimization: Want automatic routing to best-performing providers ✅ Minimal configuration: Prefer hands-off operation with intelligent defaults ✅ Dynamic workloads: Traffic patterns change frequently ✅ Automatic failover: Need instant adaptation to provider issues ✅ Multi-provider redundancy: Want seamless provider switching based on availability
Use All Three Together:

✅ Complete solution: Governance provides base routing, routing rules add dynamic override, load balancing optimizes keys ✅ Maximum flexibility: Different Virtual Keys use different strategies (governance vs routing rules vs load balancing) ✅ Enterprise deployments: Complex organizations with multiple requirements per layer

Additional Resources

Governance Routing

Configuration instructions for setting up governance routing via Virtual Keys in the Web UI

Open →

Routing Rules

Dynamic, expression-based routing using CEL expressions for runtime conditions

Open →

Adaptive Load Balancing

Enable and configure performance-based provider and key routing (Team plan and above)

Open →

Virtual Keys

Learn how to create and configure Virtual Keys

Open →

Fallbacks

Understand how automatic fallbacks work across providers

Open →