Skip to content

Model Governance and Custom Pricing

DeepIntShield keeps a live, priced inventory of every model your gateway can reach, then lets you govern and price those models on your own terms. The Model Hub is where you control which models traffic may use, replace catalog list prices with your negotiated rates, layer percentage markups or discounts on top, and cap spend per model - without forking the catalog or editing pricing data by hand.

This page covers the governance and pricing surface. For the read-only inventory of which models each provider exposes, see List of supported models.

Key benefits

  • Spend on your real numbers. Reflect contract pricing, internal chargeback markups, or negotiated discounts so every prompt run and production call carries cost attribution that matches your invoices.
  • Govern model access per key. Restrict which models a provider key (and the virtual keys routing through it) may reach.
  • Cap per-model abuse and runaway spend. Set budget and rate ceilings on individual models before a workload hits your bill.
  • No manual upkeep. Pricing data stays current automatically; your overrides ride on top and survive every refresh.
  • You have a negotiated rate with a provider that differs from their published list price.
  • You apply an internal margin or chargeback markup when billing teams for AI usage.
  • You want to restrict a key to an approved set of models for compliance or cost reasons.
  • You need a hard ceiling on what a specific model can spend or how fast it can be called.
  • You self-host a custom pricing datasheet and want the gateway to sync from it.

The workspace groups these capabilities under Model Hub:

PageWhat it controls
Model CatalogRead-only view of providers and the priced models seen per provider, with traffic and cost over a chosen window.
Model OverridesPer-model absolute cost overrides in USD.
Pricing AdjustmentsPer-integration multipliers (markup or discount) applied on top of catalog cost.
Custom PricingThe pricing datasheet URL and sync interval the catalog pulls from.
Model LimitsPer-model budget and rate-limit ceilings.

Each provider configuration carries an Allowed models list. Leave it empty to allow every model the provider exposes, or list specific model names to restrict the provider key - and the virtual keys that route through it - to exactly that set. Requests for a model outside the list are rejected before they reach the upstream.

  1. Open Workspace → AI Providers and edit the provider.
  2. In the provider’s key configuration, add the model names you want to permit to Allowed models.
  3. Save. An empty list means all models are allowed.

Allowed models is a list of model names. An empty list (or leaving the field blank) allows all models the provider exposes.

DeepIntShield offers two complementary ways to price models your way. They stack: an override sets the base cost, and an adjustment then scales it.

  • Model Overrides - replace a model’s per-token cost with an absolute USD value.
  • Pricing Adjustments - multiply an integration’s cost by a factor (e.g. 0.8 for a 20% discount, 1.2 for a 20% markup).

Use a model override when your team has a negotiated rate that differs from the provider’s published list price. You pick a model (optionally scoped to one provider), then set the input and/or output cost in USD per 1,000,000 tokens. When the override is enabled, the gateway accounts for cost on that model using your value instead of the catalog price.

  1. Go to Model Hub → Model Overrides and click New override.
  2. Provider (optional) - choose All providers to apply across every provider that serves the model, or scope to one provider.
  3. Model - select the model from the catalog.
  4. Input cost ($/1M tokens) and Output cost ($/1M tokens) - set at least one. Leave a field blank to inherit the catalog price for that direction.
  5. Notes - record the contract or ticket reference (optional).
  6. Leave Enabled on. When off, the catalog cost is used unmodified.
  7. Click Create override.
FieldDescription
modelModel name the override applies to. Required.
providerProvider to scope to, or empty for All providers.
input_cost_per_million_tokensInput cost in USD per 1M tokens. Blank inherits the catalog price.
output_cost_per_million_tokensOutput cost in USD per 1M tokens. Blank inherits the catalog price.
notesFree-text annotation (e.g. contract reference).
enabledWhen off, the catalog cost is used unmodified.

A pricing adjustment applies a decimal multiplier to an integration’s (provider’s) costs - ideal for a blanket negotiated discount or an internal margin. 1.0 leaves cost unchanged, 0.8 is a 20% discount, 1.2 is a 20% markup.

In Default mode you set one overall multiplier plus optional per-token-type multipliers (request, response, cache read, cache write). In Custom (JSON) mode you supply a JSON object keyed by token type or modality-prefixed name for finer control.

  1. Go to Model Hub → Pricing Adjustments and click New adjustment.

  2. Name - a label for the adjustment (e.g. “Acme negotiated 20% discount”).

  3. Integration - the provider this adjustment applies to.

  4. Mode - choose Default for numeric multipliers, or Custom (JSON) for granular control.

  5. In Default mode, set the Default multiplier and, optionally, multipliers for Request tokens (input), Response tokens (output), Cache read tokens, and Cache write tokens. Blank per-type fields inherit the default.

  6. In Custom (JSON) mode, supply a JSON object of multipliers. Object keys are token types or modality-prefixed names; values are decimal multipliers. For example:

    {
    "default": 1.0,
    "request_token": 0.8,
    "response_token": 0.8,
    "reasoning_token": 0.6,
    "image.default": 1.2
    }
  7. Leave Enabled on, then click Create adjustment.

FieldDescription
nameLabel for the adjustment. Required.
integrationProvider the adjustment applies to.
modedefault (numeric fields) or custom (JSON).
default_multiplierOverall multiplier. 1.0 = unchanged.
request_token_multiplierMultiplier for input tokens. Blank inherits the default.
response_token_multiplierMultiplier for output tokens. Blank inherits the default.
cache_read_multiplierMultiplier for cache-read tokens. Blank inherits the default.
cache_write_multiplierMultiplier for cache-write tokens. Blank inherits the default.
custom_jsonJSON multiplier object (Custom mode only).
enabledWhen off, the catalog cost is used unmodified.

For advanced cases you can attach pricing override rules to a provider - pattern-based rules that match models by a chosen match type and supply replacement per-unit costs. This is the most precise way to override a whole family of models at once. Manage them on the provider in Workspace → AI Providers.

Match types are evaluated with deterministic precedence - an exact model match wins over a wildcard, which in turn wins over a regex match - so you can pin a specific model while still covering everything else with a broad pattern. Each rule on a provider looks like:

{
"pricing_overrides": [
{
"model_pattern": "gpt-4o",
"match_type": "exact",
"input_cost_per_token": 0.0000045,
"output_cost_per_token": 0.0000135
},
{
"model_pattern": "gpt-4o-*",
"match_type": "wildcard",
"input_cost_per_token": 0.0000040
}
]
}
FieldDescription
model_patternThe model name or pattern to match. Required.
match_typeOne of exact, wildcard, or regex. Precedence is exact → wildcard → regex. Required.
request_typesOptional list of request types the rule applies to (e.g. chat_completion, embedding, rerank, speech, image_generation).
input_cost_per_token / output_cost_per_tokenReplacement per-token costs (note: per single token, not per million).
Tiered, cache, batch and per-unit fieldsAdditional fields are available for 128k/200k token tiers, cache read/write, batch rates, and per-character/image/audio/video pricing.

The catalog keeps model and pricing data current on its own. Under Model Hub → Custom Pricing (Pricing Configuration) you can point it at your own datasheet and control how often it refreshes.

  1. Go to Model Hub → Custom Pricing.
  2. Pricing Datasheet URL - a URL to a custom pricing datasheet. Leave empty to use default pricing.
  3. Pricing Sync Interval (hours) - how often to sync from the URL (1–8760 hours).
  4. Click Save Changes. Use Force Sync Now to pull immediately.

Leave the datasheet URL empty to use the default pricing source.

Model limits put a hard ceiling on an individual model: a spend budget and/or token and request rate limits, each resetting on a chosen period. Use them to cap a model before a runaway workload runs up your bill.

  1. Go to Model Hub → Model Limits and create a limit.
  2. Provider - scope to one provider or All Providers.
  3. Model Name - search and select the model.
  4. Budget → Maximum Spend (USD) with a reset period (Hourly, Daily, Weekly, Monthly, …).
  5. Rate Limits → Maximum Tokens and Maximum Requests, each with a reset period. Reset periods accept values such as 1m, 5m, 15m, 30m, 1h, 6h, 1d, 1w, and 1M.
  6. At least one budget or rate limit is required. Save.

When editing, the sheet shows current usage against each ceiling.

FieldDescription
model_nameModel the limit applies to. Required.
providerProvider to scope to, or omit for all providers.
budget.max_limitMaximum spend in USD per period.
budget.reset_durationBudget reset period (e.g. 1M for monthly).
rate_limit.token_max_limitMaximum tokens per period.
rate_limit.request_max_limitMaximum requests per period.
rate_limit.token_reset_duration / request_reset_durationRate-limit reset periods.

Model Hub → Model Catalog is the read-only governance dashboard: it lists each configured provider, the models seen running per provider, and traffic plus cost over a window you pick (Last hour through Last 30 days, or a custom range). Use it to see at a glance what is actually being called before deciding where to add an override, adjustment, or limit. It does not duplicate the full provider model inventory - for that, see List of supported models.