Skip to content

Embedded guard runtime

In single-binary deployments, the gateway can evaluate guards locally instead of calling out to a separate guard service. This removes a network round-trip and saves ~20–300ms per request. It’s the default, so for most deployments there’s nothing to configure.

On. When neither DEEPINTSHIELD_GUARD_URL nor DEEPINTSHIELD_GUARD_GRPC_TARGET is set, the gateway uses the embedded runtime automatically. When they are set, the gateway uses that remote runtime unless you also set DEEPINTSHIELD_GUARD_USE_EMBEDDED_RUNTIME=true.

Terminal window
# Force embedded mode (overrides any URL/GRPC target).
DEEPINTSHIELD_GUARD_USE_EMBEDDED_RUNTIME=true
# Switch back to a remote runtime.
DEEPINTSHIELD_GUARD_URL="https://guard.internal:8443"
# or
DEEPINTSHIELD_GUARD_GRPC_TARGET="guard.internal:9443"

The gateway logs the chosen mode at startup. Look for the runtime mode= line:

[Guardrails] runtime mode=embedded (lowest-overhead path; nothing else to configure)
[Guardrails] runtime mode=grpc (remote runtime - set DEEPINTSHIELD_GUARD_USE_EMBEDDED_RUNTIME=true to use the embedded one in single-binary deploys)
[Guardrails] runtime mode=http (remote runtime - gRPC is preferred when available)

Embedded is the right default for almost every deployment. Consider the remote runtime only when:

  • You need to scale guard evaluation independently of the gateway (e.g. a dedicated GPU pool for heavy hallucination classifiers).
  • You’re running multiple gateway replicas that must share a single guard tenant cache.
  • A security boundary (PCI/HIPAA) requires guard evaluation in a separate network zone.

You can pass tuning to the embedded runtime via plugin config:

{
"name": "guardrails",
"enabled": true,
"config": {
"embedded_adapter_timeout_ms": 1500,
"embedded_rag_chunk_parallelism": 8,
"embedded_timeouts_by_category": {
"pii": 150,
"toxicity": 600,
"jailbreak": 1200
}
}
}

See Per-category timeouts for the embedded_timeouts_by_category map.