Skip to content

Provider Routing

DeepIntShield offers two powerful methods for routing requests across AI providers, each serving different use cases:

  1. Governance-based Routing: Explicit, user-defined routing rules configured via Virtual Keys
  2. Adaptive Load Balancing: Automatic, performance-based routing powered by real-time metrics (Team plan and above)

When both methods are available, governance takes precedence because users have explicitly defined their routing preferences through provider configurations on Virtual Keys.


The Model Catalog is the registry of which models are available from which providers. It lets you route by model name (for example, gpt-4o) without manually listing every model for every provider, and it keeps per-model pricing current for cost and budget calculations.

What it does for you: when you request a plain model name like claude-3-5-sonnet, the catalog automatically resolves every configured provider that can serve it - including proxy and compatibility providers - so a single request can route to Anthropic, Vertex, Bedrock, or OpenRouter depending on which providers you have configured. You do not have to list each provider-specific variant:

Requested modelAlso resolves through
claude-3-5-sonnetAnthropic, Vertex, Bedrock, OpenRouter
gpt-4oOpenAI, Azure
gpt-3.5-turboOpenAI, Groq

The catalog refreshes automatically (pricing/model data is re-synced periodically, and a provider’s model list is re-fetched whenever you add or update that provider), so newly released models become usable without manual maintenance. You can also trigger a manual refresh from the dashboard: open the provider on the Providers page and use the model-list refresh control to re-fetch its available models.

The allowed_models field in provider configs controls which models can be used with that provider. Understanding its behavior is crucial for governance routing.

Configuration:

{
"provider_configs": [
{
"provider": "openai",
"allowed_models": [], // Empty = defer to catalog
"weight": 1.0
}
]
}

Behavior:

  • DeepIntShield looks up the models the catalog knows OpenAI can serve
  • The requested model is validated against that catalog list

Examples:

Terminal window
# ✅ Allowed (in catalog)
curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-4o"}'
# ✅ Allowed (in catalog)
curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-3.5-turbo"}'
# ❌ Rejected (not in OpenAI catalog)
curl -H "x-bf-vk: vk-123" -d '{"model": "claude-3-5-sonnet"}'

Use Cases:

  • Default behavior for most deployments
  • Automatically stays up-to-date with provider’s model offerings
  • No manual model list maintenance required

Governance-based routing lets you explicitly define which providers and models should handle requests for a specific Virtual Key. Attach one or more provider_configs to a Virtual Key, and requests using that key are distributed across those providers by the weights you set. Providers that are over budget, rate-limited, or that don’t allow the requested model are automatically skipped, and the remaining providers (highest weight first) become fallbacks.

{
"provider_configs": [
{
"provider": "openai",
"allowed_models": ["gpt-4o", "gpt-4o-mini"],
"weight": 0.3,
"budget": {
"max_limit": 100.0,
"current_usage": 45.0
}
},
{
"provider": "azure",
"allowed_models": ["gpt-4o"],
"weight": 0.7,
"rate_limit": {
"token_max_limit": 100000,
"token_reset_duration": "1m"
}
}
]
}

Send a request with your Virtual Key and a plain model name:

Terminal window
curl -X POST https://app.deepintshield.com/v1/chat/completions \
-H "x-bf-vk: vk-prod-main" \
-d '{"model": "gpt-4o", "messages": []}'

With the example config above, both OpenAI (weight 0.3) and Azure (weight 0.7) allow gpt-4o and are within their budget/rate limits, so the request is split by weight - roughly 70% to Azure and 30% to OpenAI. If the selected provider fails, the other is used as a fallback.

FeatureDescription
Explicit ControlDefine exactly which providers and models are accessible
Budget EnforcementAutomatically exclude providers exceeding budget limits
Rate Limit ProtectionSkip providers that have hit rate limits
Weighted DistributionControl traffic distribution with custom weights
Automatic FallbacksFailed providers automatically retry with next highest weight
Cost Optimization

Assign higher weights to cheaper providers for cost-sensitive workloads:

{
"provider_configs": [
{"provider": "groq", "weight": 0.7},
{"provider": "openai", "weight": 0.3}
]
}
Environment Separation

Create different Virtual Keys for dev/staging/prod with different provider access:

{
"virtual_keys": [
{
"id": "vk-dev",
"provider_configs": [{"provider": "ollama"}]
},
{
"id": "vk-prod",
"provider_configs": [{"provider": "openai"}, {"provider": "azure"}]
}
]
}
Compliance & Data Residency

Restrict specific Virtual Keys to compliant providers:

{
"provider_configs": [
{"provider": "azure", "allowed_models": ["gpt-4o"]},
{"provider": "bedrock", "allowed_models": ["claude-3-5-sonnet"]}
]
}

Adaptive Load Balancing automatically routes each request to the best-performing provider and API key based on live metrics (error rates, latency, rate-limit pressure), so you get failover and optimal routing without tuning weights by hand. It works at two levels: it picks the best provider for the model (when you haven’t pinned one), and - even when the provider is already fixed by governance or by you - it still picks the best-performing API key within that provider.

To use it, enable adaptive load balancing for your deployment and send requests as usual. Send a plain model name (for example gpt-4o) to let it choose the provider, or a prefixed name (openai/gpt-4o) to fix the provider and let it optimize only the key:

Terminal window
# Let adaptive routing pick the provider and key
curl -X POST https://app.deepintshield.com/v1/chat/completions \
-H "Authorization: Bearer sk-bf-your-virtual-key" \
-d '{"model": "gpt-4o", "messages": [...]}'
# Fix the provider; adaptive routing still picks the best key
curl -X POST https://app.deepintshield.com/v1/chat/completions \
-H "Authorization: Bearer sk-bf-your-virtual-key" \
-d '{"model": "openai/gpt-4o", "messages": [...]}'
BehaviorWhat it means for you
Automatic OptimizationNo manual weight tuning required
Real-time AdaptationRouting adjusts continuously as live latency and error rates change
Automatic FailoverFailing providers and keys are removed from rotation, then re-introduced once they recover
Per-key OptimizationPicks the best API key within a provider, even when the provider is fixed

Monitor load balancing performance from the dashboard:

Adaptive Load Balancing Dashboard

The dashboard shows:

  • Traffic distribution across provider-model-key routes
  • Performance metrics (error rates, latency, success rates)
  • Route health, so you can see which providers and keys are degraded or recovering
  • Actual vs expected traffic distribution

How Governance and Load Balancing Interact

Section titled “How Governance and Load Balancing Interact”

Governance and adaptive load balancing complement each other. The practical rule is:

  • The provider is chosen by governance when the Virtual Key has provider_configs (or by you when you send a prefixed model like openai/gpt-4o). Otherwise adaptive load balancing picks the best-performing provider for the model.
  • The API key within the chosen provider is always optimized by adaptive load balancing (when enabled) - even when you or governance fixed the provider.

Setup:

  • Virtual Key has provider_configs defined
  • No adaptive load balancing enabled

Request:

Terminal window
curl -X POST https://app.deepintshield.com/v1/chat/completions \
-H "x-bf-vk: vk-prod-main" \
-d '{"model": "gpt-4o", "messages": [...]}'

Behavior:

  • Governance applies weighted provider routing and selects Azure (70% weight); the model becomes azure/gpt-4o.
  • An Azure key is chosen by static weights and the request is forwarded to Azure.
ScenarioWho picks the providerWho picks the key
VK with provider_configsGovernance (weighted random)Static weights, or adaptive if enabled
VK without provider_configs, adaptive enabledAdaptive load balancingAdaptive load balancing
Model sent with provider prefixYou (already specified)Adaptive load balancing
No adaptive load balancingGovernance, your prefix, or the catalogStatic weights

Routing Rules (Dynamic Expression-Based Routing)

Section titled “Routing Rules (Dynamic Expression-Based Routing)”

Routing Rules give you dynamic, expression-based control over routing. You write a CEL expression (for example, headers["x-tier"] == "premium") and a target provider/model; when the expression matches a request, the rule overrides whatever governance or default routing would have chosen. Rules are highest-priority: a matching rule wins over a Virtual Key’s provider_configs. If no rule matches, governance and load balancing decide as usual.

Routing rules access request context through CEL variables:

// Request context
model // Requested model
provider // Current provider
// Headers and parameters (case-insensitive)
headers["x-tier"] // Request header
params["region"] // Query parameter
// Organization context
virtual_key_id // VirtualKey ID
team_name // Team name
customer_id // Customer ID
// Capacity metrics (0-100 percentage)
budget_used // Budget usage %
tokens_used // Token rate limit usage %
request // Request rate limit usage %
headers["x-tier"] == "premium" // → openai/gpt-4o
budget_used > 85 // → groq/llama-2 (cheaper)
team_name == "ml-research" // → anthropic/claude-3-opus
headers["x-environment"] == "production" &&
tokens_used < 75 &&
team_name == "ai-platform" // → openai/gpt-4o

Rules are evaluated in organizational precedence order (first-match-wins):

1. VirtualKey scope (highest priority)
2. Team scope
3. Customer scope
4. Global scope (lowest priority)

Within each scope, rules are sorted by priority (ascending: 0 before 10).

FeatureDescription
CEL ExpressionsPowerful, composable condition language with multiple operators
Scope HierarchyRules at VirtualKey/Team/Customer/Global levels with proper precedence
Dynamic OverrideOverride provider and/or model based on runtime conditions
Fallback ChainsDefine multiple fallback providers for automatic failover
Priority OrderingLower priority evaluated first within same scope
Capacity AwarenessAccess real-time budget and rate limit usage percentages

How Routing Rules Combine With Governance and Load Balancing

Section titled “How Routing Rules Combine With Governance and Load Balancing”
  • A matching rule takes precedence over governance. If a rule matches (for example budget_used > 85 routing to a cheaper provider), its target provider/model wins and the Virtual Key’s provider_configs are bypassed.
  • No match falls back to governance. When no rule matches, governance provider selection runs as normal.
  • Key optimization still applies. Whichever way the provider was chosen, adaptive load balancing still selects the best-performing API key within that provider.
  • Tier-based routing: Premium users → fast providers
  • Capacity failover: High budget usage → cheaper providers
  • Team preferences: Different teams → different providers
  • A/B testing: Route subset of traffic to test models
  • Regional routing: EU users → EU providers (data residency)
  • Complex logic: Combine multiple conditions for sophisticated routing

Routing rules are configured in the dashboard:

  • Visual rule builder with a CEL expression editor
  • Scope: Create rules at global, customer, team, or virtual key levels
  • Priority: Order rules within scope with numeric priority

For complete documentation, see Routing Rules Documentation.


  1. Use Governance When:

    Compliance requirements: Need to ensure data stays in specific regions or providers ✅ Cost optimization: Want explicit control over traffic distribution to cheaper providers ✅ Budget enforcement: Need hard limits on spending per provider ✅ Environment separation: Different teams/apps need different provider access ✅ Rate limit management: Need to respect provider-specific rate limits

  2. Use Routing Rules When:

    Dynamic routing: Route based on runtime request context (headers, parameters) ✅ Capacity-aware routing: Switch to fallback when budget/rate limits high ✅ Organization-based routing: Different rules for teams/customers ✅ A/B testing: Route subset of traffic to test new models ✅ Complex conditions: Multiple criteria (e.g., tier + capacity + team)

  3. Use Load Balancing When:

    Performance optimization: Want automatic routing to best-performing providers ✅ Minimal configuration: Prefer hands-off operation with intelligent defaults ✅ Dynamic workloads: Traffic patterns change frequently ✅ Automatic failover: Need instant adaptation to provider issues ✅ Multi-provider redundancy: Want seamless provider switching based on availability

  4. Use All Three Together:

    Complete solution: Governance provides base routing, routing rules add dynamic override, load balancing optimizes keys ✅ Maximum flexibility: Different Virtual Keys use different strategies (governance vs routing rules vs load balancing) ✅ Enterprise deployments: Complex organizations with multiple requirements per layer


Governance Routing

Configuration instructions for setting up governance routing via Virtual Keys in the Web UI

Open →

Routing Rules

Dynamic, expression-based routing using CEL expressions for runtime conditions

Open →

Adaptive Load Balancing

Enable and configure performance-based provider and key routing (Team plan and above)

Open →

Virtual Keys

Learn how to create and configure Virtual Keys

Open →

Fallbacks

Understand how automatic fallbacks work across providers

Open →