Embedded guard runtime
Default: On. Guards evaluate locally with no extra network hop, saving ~20–300ms per request in single-binary deployments. Nothing to configure.
DeepintShield ships five optimizations that move the safety-check tax off your critical path. Most are on by default; two are one-flag opt-ins for teams that want maximum latency or cost reduction.
Embedded guard runtime
Default: On. Guards evaluate locally with no extra network hop, saving ~20–300ms per request in single-binary deployments. Nothing to configure.
Speculative dispatch
Default: Off (opt-in). Run the provider call alongside input guards so
allow-path latency becomes max(guards, model) instead of guards + model.
Async post-guards
Default: On (auto). When no output policy needs to block or redact, the response ships immediately and output checks run in the background.
Per-category timeouts
Default: Opt-in. Set a separate time budget per check class - PII <150ms,
toxicity ~600ms, jailbreak ~1200ms - so one slow classifier no longer pulls p99
up to a flat 1500ms ceiling.
Semantic cache short-circuit
Default: On. A fuzzy cache hit returns the answer immediately, skipping both guard and provider calls. Up to 60% cost reduction on chatbot-style workloads.
| Metric | DeepintShield default | Why |
|---|---|---|
| Guardrail latency (p50) | <5ms | Embedded runtime + decision cache + local-rule fast path |
| Allow-path total latency | max(guards, model) | Speculative dispatch (non-streaming requests) |
| LLM cost saved | Up to 90% | Full 12-layer cost-optimization stack (cache, compression, cascade, throttle) |
| Tail latency (p99) | ~30–50% lower | Per-category timeouts replace the flat 1500ms ceiling |