Guardrails

Overview

Guardrails in DeepIntShield provide content safety, security validation, and policy enforcement for LLM requests and responses. The system validates inputs and outputs in real-time against your policies, protecting against harmful content, prompt injection, PII leakage, and policy violations.

What you configure

You set up guardrails with two building blocks. You create Profiles first, then write Rules that point at them:

Building block	What you do with it
Profiles	Connect an external guardrail provider (AWS Bedrock, Azure Content Safety, GraySwan, or Patronus AI) once with its credentials and detection settings. A profile is reusable - link it from as many rules as you like.
Rules	Decide when and what to check, using a CEL (Common Expression Language) condition. A rule applies to inputs, outputs, or both, and links to one or more profiles. Link several profiles to one rule for layered (defense-in-depth) protection.

Key Features

Feature	Description
Multi-Provider Support	AWS Bedrock, Azure Content Safety, GraySwan, and Patronus AI integration
Dual-Stage Validation	Guard both inputs (prompts) and outputs (responses)
Real-Time Processing	Synchronous and asynchronous validation modes
CEL-Based Rules	Define custom policies using Common Expression Language
Reusable Profiles	Configure providers once, use across multiple rules
Sampling Control	Apply rules to a percentage of requests for performance tuning
Automatic Remediation	Block, redact, or modify content based on policy
Comprehensive Logging	Detailed audit trails for compliance

Navigating Guardrails in the UI

Access Guardrails from the DeepIntShield dashboard:

Page	Path	Description
Configuration	Guardrails > Configuration	Manage guardrail rules and their settings
Providers	Guardrails > Providers	Configure and manage guardrail profiles

When a request comes in, your input rules check the prompt before it reaches the provider, and your output rules check the response before it returns. A check that fails can block, redact, or modify the content based on the matching profile and policy.

Supported Guardrail Providers

DeepIntShield integrates with leading guardrail providers to offer comprehensive protection:

AWS Bedrock Guardrails

Amazon Bedrock Guardrails provides enterprise-grade content filtering and safety features with deep AWS integration.

AWS Bedrock Guardrails configuration form

Capabilities:

Content Filters: Hate speech, insults, sexual content, violence, misconduct
Denied Topics: Block specific topics or categories
Word Filters: Custom profanity and sensitive word blocking
PII Protection: Detect and redact 50+ PII entity types
Contextual Grounding: Verify responses against source documents
Prompt Attack Detection: Identify injection and jailbreak attempts
Image Content Support: Analyze images in addition to text (PNG, JPEG)

Configuration Fields:

Field	Type	Required	Default	Description
`access_key`	string	No*	-	AWS Access Key ID
`secret_key`	string	No*	-	AWS Secret Access Key
`bedrock_api_key`	string	No*	-	Alternative Bedrock API key (Bearer token)
`guardrail_arn`	string	Yes	-	ARN of the Bedrock guardrail
`guardrail_version`	string	Yes	-	Version of the guardrail (e.g., “1”, “DRAFT”)
`region`	string	Yes	-	AWS region

Authentication Methods:

Access Key + Secret Key
Bedrock API Key

Uses AWS SDK with static credentials:

{
  "access_key": "AKIAXXXXXXXXXXXXXXXXXX",
  "secret_key": "your-secret-access-key",
  "guardrail_arn": "arn:aws:bedrock:us-east-1:123456789:guardrail/abc123",
  "guardrail_version": "1",
  "region": "us-east-1"
}

Uses HTTP REST API with Bearer token:

{
  "bedrock_api_key": "your-bedrock-api-key",
  "guardrail_arn": "arn:aws:bedrock:us-east-1:123456789:guardrail/abc123",
  "guardrail_version": "1",
  "region": "us-east-1"
}

Supported AWS Regions:

Region Code	Region Name
`us-east-1`	US East (N. Virginia)
`us-east-2`	US East (Ohio)
`us-west-1`	US West (N. California)
`us-west-2`	US West (Oregon)
`ap-south-1`	Asia Pacific (Mumbai)
`ap-northeast-1`	Asia Pacific (Tokyo)
`ap-northeast-2`	Asia Pacific (Seoul)
`ap-southeast-1`	Asia Pacific (Singapore)
`ap-southeast-2`	Asia Pacific (Sydney)
`eu-central-1`	Europe (Frankfurt)
`eu-west-1`	Europe (Ireland)
`eu-west-2`	Europe (London)
`eu-west-3`	Europe (Paris)

Supported Content Types:

Text content
Images (PNG, JPEG formats)

Usage Metrics Returned:

Bedrock guardrails return detailed usage metrics for cost tracking and monitoring:

Metric	Description
`content_policy_units`	Units consumed by content policy evaluation
`contextual_grounding_policy_units`	Units for grounding checks
`sensitive_information_policy_units`	Units for PII detection
`topic_policy_units`	Units for topic filtering
`word_policy_units`	Units for word filtering
`automated_reasoning_policy_units`	Units for reasoning checks
`content_policy_image_units`	Units for image content analysis

Supported PII Types:

Personal identifiers (SSN, passport, driver’s license)
Financial information (credit cards, bank accounts)
Contact information (email, phone, address)
Medical information (health records, insurance)
Device identifiers (IP addresses, MAC addresses)

Azure Content Safety

Azure AI Content Safety provides multi-modal content moderation powered by Microsoft’s advanced AI models.

Azure Content Safety configuration form

Capabilities:

Severity-Based Filtering: 4-level severity classification (Safe, Low, Medium, High)
Multi-Category Detection: Hate, sexual, violence, self-harm content
Prompt Shield: Advanced jailbreak and injection detection
Indirect Attack Detection: Identify hidden malicious instructions
Protected Material: Detect copyrighted content (output only)
Custom Blocklists: Define organization-specific blocked terms

Configuration Fields:

Field	Type	Required	Default	Description
`endpoint`	string	Yes	-	Azure Content Safety endpoint URL
`api_key`	string	Yes	-	Azure subscription key
`analyze_enabled`	boolean	No	true	Enable content analysis for Hate, Sexual, Violence, SelfHarm
`analyze_severity_threshold`	enum	No	”medium”	Severity level to trigger: `low`, `medium`, or `high`
`jailbreak_shield_enabled`	boolean	No	false	Enable jailbreak detection (input only)
`indirect_attack_shield_enabled`	boolean	No	false	Enable indirect prompt attack detection (input only)
`copyright_enabled`	boolean	No	false	Enable copyrighted content detection (output only)
`text_blocklist_enabled`	boolean	No	false	Enable custom blocklist filtering
`blocklist_names`	array	No	-	List of Azure blocklist names to apply

Severity Threshold Levels:

Threshold	Numeric Value	Behavior
`low`	2	Most strict - blocks severity 2 and above
`medium`	4	Balanced - blocks severity 4 and above
`high`	6	Least strict - blocks only severity 6

Detection Categories:

Hate and fairness
Sexual content
Violence
Self-harm

Patronus AI

Patronus AI specializes in LLM security and safety with advanced evaluation capabilities.

Capabilities:

Hallucination Detection: Identify factually incorrect responses
PII Detection: Comprehensive personal data identification
Toxicity Screening: Multi-language toxic content detection
Prompt Injection Defense: Advanced attack pattern recognition
Custom Evaluators: Build organization-specific safety checks
Real-Time Monitoring: Continuous safety validation

Advanced Features:

Context-aware evaluation
Multi-turn conversation analysis
Custom policy templates
Integration with existing safety workflows

GraySwan Cygnal

GraySwan Cygnal Monitor provides AI safety monitoring with natural language rule definitions and advanced threat detection capabilities.

GraySwan configuration form

Capabilities:

Violation Scoring: Continuous 0-1 scale violation detection with configurable thresholds
Custom Natural Language Rules: Define safety rules in plain English without code
Policy Management: Use pre-built policies from GraySwan platform or create custom ones
Indirect Prompt Injection (IPI) Detection: Identify hidden instructions in user inputs
Mutation Detection: Detect attempts to manipulate or alter content
Reasoning Modes: Choose from fast (“off”), balanced (“hybrid”), or thorough (“thinking”) analysis

Configuration Fields:

Field	Type	Required	Default	Description
`api_key`	string	Yes	-	GraySwan API key
`violation_threshold`	number	No	0.5	Score threshold (0-1) for triggering intervention. Lower values are more strict.
`reasoning_mode`	enum	No	”off”	Analysis depth: `off` (fastest), `hybrid` (balanced), or `thinking` (most thorough)
`policy_id`	string	No	-	Single custom policy ID from GraySwan platform
`policy_ids`	array	No	-	Multiple policy IDs for aggregated rule evaluation
`rules`	object	No	-	Custom natural language rules as key-value pairs

Custom Rules Example:

GraySwan custom rules

Rules are defined as key-value pairs where the key is the rule name and the value is a natural language description:

{
  "rules": {
    "no_profanity": "Do not allow profanity or vulgar language",
    "no_pii": "Do not allow personally identifiable information",
    "professional_tone": "Ensure all responses maintain a professional tone"
  }
}

Detection Features:

Real-time violation scoring
Multi-rule evaluation
IPI attack detection
Content mutation monitoring
Detailed violation descriptions with rule attribution

Guardrail Rules

Guardrail Rules are custom policies that define when and how content validation occurs. Rules use CEL (Common Expression Language) expressions to evaluate requests and can be linked to one or more profiles for execution.

Rule Properties

Property	Type	Required	Description
`id`	integer	Yes	Unique identifier for the rule
`name`	string	Yes	Descriptive name for the rule
`description`	string	No	Explanation of what the rule does
`enabled`	boolean	Yes	Whether the rule is active
`cel_expression`	string	Yes	CEL expression for rule evaluation
`apply_to`	enum	Yes	When to apply: `input`, `output`, or `both`
`sampling_rate`	integer	No	Percentage of requests to evaluate (0-100)
`timeout`	integer	No	Execution timeout in milliseconds
`provider_config_ids`	array	No	IDs of profiles to use for evaluation

Navigate to Rules
- Go to Guardrails > Configuration
- Click Add Rule

Configure Rule Settings

Basic Information:

Name: Enter a descriptive name (e.g., “Block PII in Prompts”)
Description: Explain the rule’s purpose
Enabled: Toggle to activate the rule

Evaluation Settings:

Apply To: Select when to apply the rule
- input - Validate incoming prompts only
- output - Validate LLM responses only
- both - Validate both inputs and outputs
CEL Expression: Define the validation logic
Sampling Rate: Set percentage of requests to evaluate (default: 100%)
Timeout: Set maximum execution time in milliseconds

Link Profiles
- Select one or more profiles to use for evaluation
- Rules will execute all linked profiles in sequence
Save and Test
- Click Save Rule
- Use the Test button to validate with sample content

guardrails_config:
  guardrail_rules:
    - id: 1
      name: "Block PII in Prompts"
      description: "Prevent PII from being sent to LLM providers"
      enabled: true
      cel_expression: "request.messages.exists(m, m.role == 'user')"
      apply_to: "input"
      sampling_rate: 100
      timeout: 5000
      provider_config_ids: [1, 2]
    - id: 2
      name: "Content Filter for Responses"
      description: "Filter harmful content from LLM responses"
      enabled: true
      cel_expression: "true"
      apply_to: "output"
      sampling_rate: 100
      timeout: 3000
      provider_config_ids: [2]

CEL Expression Examples

CEL (Common Expression Language) provides a powerful way to define rule conditions. Here are common patterns:

Always Apply Rule:

true

Apply to User Messages Only:

request.messages.exists(m, m.role == "user")

Apply to Messages Containing Keywords:

request.messages.exists(m, m.content.contains("confidential"))

Apply Based on Model:

request.model.startsWith("gpt-4")

Apply to Long Prompts:

request.messages.filter(m, m.role == "user").map(m, m.content.size()).sum() > 1000

Combine Multiple Conditions:

request.model.startsWith("gpt-4") && request.messages.exists(m, m.role == "user" && m.content.size() > 500)

Linking Rules to Profiles

Rules can be linked to multiple profiles for comprehensive validation:

Rule configuration showing linked profiles

Best Practices:

Link PII detection rules to profiles with PII capabilities (Bedrock, Patronus)
Link content filtering rules to profiles with content safety features (Azure, Bedrock, GraySwan)
Use GraySwan for custom natural language rules when you need flexible, readable policies
Use multiple profiles for defense-in-depth (e.g., Bedrock + Patronus for PII, Azure + GraySwan for content)
Set appropriate timeouts when using multiple profiles

Managing Profiles

Profiles are reusable configurations for external guardrail providers. Each profile contains provider-specific settings including credentials, endpoints, and detection thresholds.

Guardrail profiles list showing configured providers

Profile Properties

Property	Type	Required	Description
`id`	integer	Yes	Unique identifier for the profile
`provider_name`	string	Yes	Provider type: `bedrock`, `azure`, `grayswan`, `patronus_ai`
`policy_name`	string	Yes	Descriptive name for the policy
`enabled`	boolean	Yes	Whether the profile is active
`config`	object	No	Provider-specific configuration

Creating Profiles

Web UI
Helm

Navigate to Providers
- Go to Guardrails > Providers
- Click Add Profile

Select Provider Type
- Choose from: AWS Bedrock, Azure Content Safety, GraySwan, or Patronus AI
Configure Provider Settings
- Enter credentials and endpoint information
- Configure detection thresholds and actions
- See provider-specific setup sections above for detailed configuration
Save Profile
- Click Save Profile
- The profile is now available for linking to rules

guardrails_config:
  guardrail_providers:
    - id: 1
      provider_name: "bedrock"
      policy_name: "PII Detection Profile"
      enabled: true
      config:
        guardrail_arn: "arn:aws:bedrock:us-east-1:123456789:guardrail/abc123"
        guardrail_version: "1"
        region: "us-east-1"
        # AWS Authentication (choose one method):
        # Option 1: Explicit credentials
        access_key: "${AWS_ACCESS_KEY_ID}"
        secret_key: "${AWS_SECRET_ACCESS_KEY}"
        # Option 2: IAM Role - omit access_key and secret_key
        # (DeepIntShield will use IAM credentials from the environment)
    - id: 2
      provider_name: "azure"
      policy_name: "Content Safety Profile"
      enabled: true
      config:
        endpoint: "https://your-resource.cognitiveservices.azure.com/"
        api_key: "${AZURE_CONTENT_SAFETY_API_KEY}"
        analyze_enabled: true
        analyze_severity_threshold: "medium"
        jailbreak_shield_enabled: true
    - id: 3
      provider_name: "grayswan"
      policy_name: "Custom Safety Rules"
      enabled: true
      config:
        api_key: "${GRAYSWAN_API_KEY}"
        violation_threshold: 0.5
        reasoning_mode: "hybrid"
        rules:
          no_pii: "Do not allow personally identifiable information"
          professional_tone: "Ensure responses maintain a professional tone"
    - id: 4
      provider_name: "patronus_ai"
      policy_name: "Hallucination Detection"
      enabled: true
      config:
        api_endpoint: "https://api.patronus.ai/v1"

Provider Capabilities

Each provider offers different capabilities. Choose profiles based on your validation needs:

Capability	AWS Bedrock	Azure Content Safety	GraySwan	Patronus AI
PII Detection	Yes	No	No	Yes
Content Filtering	Yes	Yes	Yes	Yes
Prompt Injection	Yes	Yes	Yes	Yes
Hallucination Detection	No	No	No	Yes
Toxicity Screening	Yes	Yes	Yes	Yes
Custom Policies	Yes	Yes	Yes	Yes
Custom Natural Language Rules	No	No	Yes	No
Image Support	Yes	No	No	No
IPI Detection	No	Yes	Yes	No
Mutation Detection	No	No	Yes	No

Best Practices

Profile Organization:

Create separate profiles for different use cases (PII, content filtering, etc.)
Use descriptive policy names that indicate the profile’s purpose
Keep credentials secure using environment variables

Performance Considerations:

Enable only the profiles you need to minimize latency
Use sampling rates on rules for high-traffic endpoints
Set appropriate timeouts to prevent slow requests

Security:

Store API keys and credentials in environment variables or secrets managers
Regularly rotate credentials
Use least-privilege IAM roles for AWS Bedrock

Using Guardrails in Requests

Attaching guardrails to a virtual key

In production, guardrails are attached to a virtual key in the dashboard (Config → Guardrails) and run automatically on every request made with that key - there is no per-request header to set. Just call the gateway as usual:

curl -X POST https://app.deepintshield.com/v1/chat/completions \
  -H "Authorization: Bearer sk-bf-your-virtual-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      { "role": "user", "content": "Help me with this task" }
    ]
  }'

The input guardrails bound to that key run before the prompt reaches the provider, and any output guardrails run on the response before it is returned.

Note: Passing raw guardrail definitions inline on a request (x-bf-input-guardrails / x-bf-output-guardrails) is restricted to the dashboard test-lab simulation flow. For live traffic, bind guardrails to the virtual key as shown above.

Reading the guardrail outcome

DeepintShield reports the guardrail result on response headers, not in the response body:

Header	Values	Meaning
`x-deepintshield-guardrail-status`	`pass`, `blocked`, `redacted`, `flagged`	Outcome of the guardrail evaluation
`x-deepintshield-guardrail-mode`	`sync`, `async`, `shadow`	Execution mode the guardrail ran in

Passed (HTTP 200) - the normal completion is returned, with x-deepintshield-guardrail-status: pass:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "I'd be happy to help..." },
      "finish_reason": "stop"
    }
  ]
}

Blocked (HTTP 403) - the request is rejected with a guardrail_blocked error and x-deepintshield-guardrail-status: blocked:

{
  "error": {
    "type": "guardrail_blocked",
    "code": "guardrail_blocked",
    "message": "Request blocked by guardrail policy"
  }
}

When a guardrail is configured to redact rather than block, the completion is returned (HTTP 200) with the offending content removed and x-deepintshield-guardrail-status: redacted. In monitor / shadow mode the request is never blocked - the violation is only recorded - and the status header reflects flagged.

Multimodal guardrails

By default the guardrail engine evaluates text, chat, responses and passthrough requests. Multimodal guardrails extend the same policy engine, decision path and x-deepintshield-guardrail-* headers to non-text endpoints - so image, audio and video traffic is governed exactly like text, with one consistent verdict model.

What gets guarded

Endpoint	Request type	What is evaluated
Image generation	`image_generation`	The prompt (and negative prompt)
Image edit	`image_edit`	The prompt + text embedded in the source images (PNG/JPEG metadata, OCR-style strings)
Speech / TTS	`speech`	The text to synthesize + voice instructions
Transcription	`transcription`	The produced transcript (output side)
Video generation	`video_generation`	The prompt (and negative prompt)
Embeddings	`embedding`	The input text(s)
Rerank	`rerank`	The query + candidate documents

The text these requests already carry is guarded by the existing text detectors. Binary artifacts (source images, generated images, audio, video) are forwarded to the guard runtime as attachments - fingerprinted with a content hash for deduplication - where a modality-extraction stage resolves them to text (document/OCR text today; OCR/STT/keyframe extractors are pluggable) and optional modality detectors (vision/audio safety models) score them natively.

Enabling it

Multimodal guarding is off by default - when disabled, behavior is identical to text-only guarding and there is no added latency on any request. Enable it with environment variables (Helm: deepintshield.guardrails.*):

Variable	Component	Effect
`GUARDRAILS_MULTIMODAL`	gateway	Gate image/audio/video/embedding/rerank requests into the engine and guard the text they carry; forward artifacts as attachments
`GUARDRAILS_MODALITY_INLINE_BYTES`	gateway	Inline raw artifact bytes into the guard request so the extraction stage can process them (otherwise only metadata + content hash are sent)
`GUARDRAILS_STREAM_ACCUMULATE`	gateway	Guard streamed output on a growing window so a violation split across chunks is caught
`DEEPINTSHIELD_GUARD_MODALITY_EXTRACT`	guard runtime	Run the modality-extraction / detector stage over forwarded attachments

Performance & scale

Zero added latency when off, and near-zero when on for the common case: the text already present (prompts, transcripts) reuses the existing text path.
Bounded work - per-request attachment count, per-asset byte size and extracted-text length are all capped, and identical assets are deduplicated by content hash, so a large image or video can never tie up a worker.
Scales horizontally - heavy extraction/detection runs in the separately deployable guard runtime, off the gateway hot path.

Analytics

Multimodal activity appears on the Guardrail Metrics → Multimodal tab (workspace analytics): a decision timeline by modality, request distribution, decision breakdown, and attachment-level findings - using the same filters and date range as the other guardrail dashboards.