Performance & Cost

DeepintShield ships five optimizations that move the safety-check tax off your critical path. Most are on by default; two are one-flag opt-ins for teams that want maximum latency or cost reduction.

<5ms p50Guardrail latency

Up to 90%LLM cost saved

max(g, m)Allow-path latency

~30–50%p99 reduction

The five knobs

Embedded guard runtime

Default: On. Guards evaluate locally with no extra network hop, saving ~20–300ms per request in single-binary deployments. Nothing to configure.

Speculative dispatch

Default: Off (opt-in). Run the provider call alongside input guards so allow-path latency becomes max(guards, model) instead of guards + model.

Async post-guards

Default: On (auto). When no output policy needs to block or redact, the response ships immediately and output checks run in the background.

Per-category timeouts

Default: Opt-in. Set a separate time budget per check class - PII <150ms, toxicity ~600ms, jailbreak ~1200ms - so one slow classifier no longer pulls p99 up to a flat 1500ms ceiling.

Semantic cache short-circuit

Default: On. A fuzzy cache hit returns the answer immediately, skipping both guard and provider calls. Up to 60% cost reduction on chatbot-style workloads.

Metric	DeepintShield default	Why
Guardrail latency (p50)	<5ms	Embedded runtime + decision cache + local-rule fast path
Allow-path total latency	max(guards, model)	Speculative dispatch (non-streaming requests)
LLM cost saved	Up to 90%	Full 12-layer cost-optimization stack (cache, compression, cascade, throttle)
Tail latency (p99)	~30–50% lower	Per-category timeouts replace the flat 1500ms ceiling

Performance & Cost

The five knobs

What you can expect