Model Governance and Custom Pricing
Overview
Section titled “Overview”DeepIntShield keeps a live, priced inventory of every model your gateway can reach, then lets you govern and price those models on your own terms. The Model Hub is where you control which models traffic may use, replace catalog list prices with your negotiated rates, layer percentage markups or discounts on top, and cap spend per model - without forking the catalog or editing pricing data by hand.
This page covers the governance and pricing surface. For the read-only inventory of which models each provider exposes, see List of supported models.
Key benefits
- Spend on your real numbers. Reflect contract pricing, internal chargeback markups, or negotiated discounts so every prompt run and production call carries cost attribution that matches your invoices.
- Govern model access per key. Restrict which models a provider key (and the virtual keys routing through it) may reach.
- Cap per-model abuse and runaway spend. Set budget and rate ceilings on individual models before a workload hits your bill.
- No manual upkeep. Pricing data stays current automatically; your overrides ride on top and survive every refresh.
When to use
Section titled “When to use”- You have a negotiated rate with a provider that differs from their published list price.
- You apply an internal margin or chargeback markup when billing teams for AI usage.
- You want to restrict a key to an approved set of models for compliance or cost reasons.
- You need a hard ceiling on what a specific model can spend or how fast it can be called.
- You self-host a custom pricing datasheet and want the gateway to sync from it.
The Model Hub pages
Section titled “The Model Hub pages”The workspace groups these capabilities under Model Hub:
| Page | What it controls |
|---|---|
| Model Catalog | Read-only view of providers and the priced models seen per provider, with traffic and cost over a chosen window. |
| Model Overrides | Per-model absolute cost overrides in USD. |
| Pricing Adjustments | Per-integration multipliers (markup or discount) applied on top of catalog cost. |
| Custom Pricing | The pricing datasheet URL and sync interval the catalog pulls from. |
| Model Limits | Per-model budget and rate-limit ceilings. |
Governing which models a key may reach
Section titled “Governing which models a key may reach”Each provider configuration carries an Allowed models list. Leave it empty to allow every model the provider exposes, or list specific model names to restrict the provider key - and the virtual keys that route through it - to exactly that set. Requests for a model outside the list are rejected before they reach the upstream.
- Open Workspace → AI Providers and edit the provider.
- In the provider’s key configuration, add the model names you want to permit to Allowed models.
- Save. An empty list means all models are allowed.
Allowed models is a list of model names. An empty list (or leaving the field blank) allows all models the provider exposes.
Custom pricing: overrides and adjustments
Section titled “Custom pricing: overrides and adjustments”DeepIntShield offers two complementary ways to price models your way. They stack: an override sets the base cost, and an adjustment then scales it.
- Model Overrides - replace a model’s per-token cost with an absolute USD value.
- Pricing Adjustments - multiply an integration’s cost by a factor (e.g.
0.8for a 20% discount,1.2for a 20% markup).
Model Overrides
Section titled “Model Overrides”Use a model override when your team has a negotiated rate that differs from the provider’s published list price. You pick a model (optionally scoped to one provider), then set the input and/or output cost in USD per 1,000,000 tokens. When the override is enabled, the gateway accounts for cost on that model using your value instead of the catalog price.
- Go to Model Hub → Model Overrides and click New override.
- Provider (optional) - choose All providers to apply across every provider that serves the model, or scope to one provider.
- Model - select the model from the catalog.
- Input cost ($/1M tokens) and Output cost ($/1M tokens) - set at least one. Leave a field blank to inherit the catalog price for that direction.
- Notes - record the contract or ticket reference (optional).
- Leave Enabled on. When off, the catalog cost is used unmodified.
- Click Create override.
Model override fields
Section titled “Model override fields”| Field | Description |
|---|---|
model | Model name the override applies to. Required. |
provider | Provider to scope to, or empty for All providers. |
input_cost_per_million_tokens | Input cost in USD per 1M tokens. Blank inherits the catalog price. |
output_cost_per_million_tokens | Output cost in USD per 1M tokens. Blank inherits the catalog price. |
notes | Free-text annotation (e.g. contract reference). |
enabled | When off, the catalog cost is used unmodified. |
Pricing Adjustments
Section titled “Pricing Adjustments”A pricing adjustment applies a decimal multiplier to an integration’s (provider’s) costs - ideal for a blanket negotiated discount or an internal margin. 1.0 leaves cost unchanged, 0.8 is a 20% discount, 1.2 is a 20% markup.
In Default mode you set one overall multiplier plus optional per-token-type multipliers (request, response, cache read, cache write). In Custom (JSON) mode you supply a JSON object keyed by token type or modality-prefixed name for finer control.
-
Go to Model Hub → Pricing Adjustments and click New adjustment.
-
Name - a label for the adjustment (e.g. “Acme negotiated 20% discount”).
-
Integration - the provider this adjustment applies to.
-
Mode - choose Default for numeric multipliers, or Custom (JSON) for granular control.
-
In Default mode, set the Default multiplier and, optionally, multipliers for Request tokens (input), Response tokens (output), Cache read tokens, and Cache write tokens. Blank per-type fields inherit the default.
-
In Custom (JSON) mode, supply a JSON object of multipliers. Object keys are token types or modality-prefixed names; values are decimal multipliers. For example:
{"default": 1.0,"request_token": 0.8,"response_token": 0.8,"reasoning_token": 0.6,"image.default": 1.2} -
Leave Enabled on, then click Create adjustment.
Pricing adjustment fields
Section titled “Pricing adjustment fields”| Field | Description |
|---|---|
name | Label for the adjustment. Required. |
integration | Provider the adjustment applies to. |
mode | default (numeric fields) or custom (JSON). |
default_multiplier | Overall multiplier. 1.0 = unchanged. |
request_token_multiplier | Multiplier for input tokens. Blank inherits the default. |
response_token_multiplier | Multiplier for output tokens. Blank inherits the default. |
cache_read_multiplier | Multiplier for cache-read tokens. Blank inherits the default. |
cache_write_multiplier | Multiplier for cache-write tokens. Blank inherits the default. |
custom_json | JSON multiplier object (Custom mode only). |
enabled | When off, the catalog cost is used unmodified. |
Provider-level override rules
Section titled “Provider-level override rules”For advanced cases you can attach pricing override rules to a provider - pattern-based rules that match models by a chosen match type and supply replacement per-unit costs. This is the most precise way to override a whole family of models at once. Manage them on the provider in Workspace → AI Providers.
Match types are evaluated with deterministic precedence - an exact model match wins over a wildcard, which in turn wins over a regex match - so you can pin a specific model while still covering everything else with a broad pattern. Each rule on a provider looks like:
{ "pricing_overrides": [ { "model_pattern": "gpt-4o", "match_type": "exact", "input_cost_per_token": 0.0000045, "output_cost_per_token": 0.0000135 }, { "model_pattern": "gpt-4o-*", "match_type": "wildcard", "input_cost_per_token": 0.0000040 } ]}Override-rule fields
Section titled “Override-rule fields”| Field | Description |
|---|---|
model_pattern | The model name or pattern to match. Required. |
match_type | One of exact, wildcard, or regex. Precedence is exact → wildcard → regex. Required. |
request_types | Optional list of request types the rule applies to (e.g. chat_completion, embedding, rerank, speech, image_generation). |
input_cost_per_token / output_cost_per_token | Replacement per-token costs (note: per single token, not per million). |
| Tiered, cache, batch and per-unit fields | Additional fields are available for 128k/200k token tiers, cache read/write, batch rates, and per-character/image/audio/video pricing. |
Where catalog pricing comes from
Section titled “Where catalog pricing comes from”The catalog keeps model and pricing data current on its own. Under Model Hub → Custom Pricing (Pricing Configuration) you can point it at your own datasheet and control how often it refreshes.
- Go to Model Hub → Custom Pricing.
- Pricing Datasheet URL - a URL to a custom pricing datasheet. Leave empty to use default pricing.
- Pricing Sync Interval (hours) - how often to sync from the URL (1–8760 hours).
- Click Save Changes. Use Force Sync Now to pull immediately.
Leave the datasheet URL empty to use the default pricing source.
Per-model limits
Section titled “Per-model limits”Model limits put a hard ceiling on an individual model: a spend budget and/or token and request rate limits, each resetting on a chosen period. Use them to cap a model before a runaway workload runs up your bill.
- Go to Model Hub → Model Limits and create a limit.
- Provider - scope to one provider or All Providers.
- Model Name - search and select the model.
- Budget → Maximum Spend (USD) with a reset period (Hourly, Daily, Weekly, Monthly, …).
- Rate Limits → Maximum Tokens and Maximum Requests, each with a reset period. Reset periods accept values such as
1m,5m,15m,30m,1h,6h,1d,1w, and1M. - At least one budget or rate limit is required. Save.
When editing, the sheet shows current usage against each ceiling.
Model limit fields
Section titled “Model limit fields”| Field | Description |
|---|---|
model_name | Model the limit applies to. Required. |
provider | Provider to scope to, or omit for all providers. |
budget.max_limit | Maximum spend in USD per period. |
budget.reset_duration | Budget reset period (e.g. 1M for monthly). |
rate_limit.token_max_limit | Maximum tokens per period. |
rate_limit.request_max_limit | Maximum requests per period. |
rate_limit.token_reset_duration / request_reset_duration | Rate-limit reset periods. |
The Model Catalog view
Section titled “The Model Catalog view”Model Hub → Model Catalog is the read-only governance dashboard: it lists each configured provider, the models seen running per provider, and traffic plus cost over a window you pick (Last hour through Last 30 days, or a custom range). Use it to see at a glance what is actually being called before deciding where to add an override, adjustment, or limit. It does not duplicate the full provider model inventory - for that, see List of supported models.
Next steps
Section titled “Next steps”- Virtual keys - scope models and apply budgets per key.
- Budget and limits - hierarchical spend control across customers, teams, and keys.
- List of supported models - the full per-provider model inventory.
- Provider routing - pin and weight which providers and keys serve a request.