Skip to content

Performance & Cost

DeepintShield ships five optimizations that move the safety-check tax off your critical path. Most are on by default; two are one-flag opt-ins for teams that want maximum latency or cost reduction.

<5ms p50Guardrail latency
Up to 90%LLM cost saved
max(g, m)Allow-path latency
~30–50%p99 reduction

Embedded guard runtime

Default: On. Guards evaluate locally with no extra network hop, saving ~20–300ms per request in single-binary deployments. Nothing to configure.

Read more →

Speculative dispatch

Default: Off (opt-in). Run the provider call alongside input guards so allow-path latency becomes max(guards, model) instead of guards + model.

Read more →

Async post-guards

Default: On (auto). When no output policy needs to block or redact, the response ships immediately and output checks run in the background.

Read more →

Per-category timeouts

Default: Opt-in. Set a separate time budget per check class - PII <150ms, toxicity ~600ms, jailbreak ~1200ms - so one slow classifier no longer pulls p99 up to a flat 1500ms ceiling.

Read more →

Semantic cache short-circuit

Default: On. A fuzzy cache hit returns the answer immediately, skipping both guard and provider calls. Up to 60% cost reduction on chatbot-style workloads.

Read more →

MetricDeepintShield defaultWhy
Guardrail latency (p50)<5msEmbedded runtime + decision cache + local-rule fast path
Allow-path total latencymax(guards, model)Speculative dispatch (non-streaming requests)
LLM cost savedUp to 90%Full 12-layer cost-optimization stack (cache, compression, cascade, throttle)
Tail latency (p99)~30–50% lowerPer-category timeouts replace the flat 1500ms ceiling