Skip to content

Telemetry

DeepIntShield provides built-in telemetry and monitoring capabilities through Prometheus metrics collection. The telemetry system tracks both HTTP-level performance metrics and upstream provider interactions, giving you complete visibility into your AI gateway’s performance and usage patterns.

Key Features:

  • Prometheus Integration - Native metrics collection at /metrics endpoint
  • Comprehensive Tracking - Success/error rates, token usage, costs, and cache performance
  • Custom Labels - Configurable dimensions for detailed analysis
  • Dynamic Headers - Runtime label injection via x-bf-prom-* headers
  • Cost Monitoring - Real-time tracking of AI provider costs in USD
  • Cache Analytics - Direct and semantic cache hit tracking
  • Async Collection - Zero-latency impact on request processing
  • Multi-Level Tracking - HTTP transport + upstream provider metrics

The telemetry plugin operates asynchronously to ensure metrics collection doesn’t impact request latency or connection performance.


These metrics track all incoming HTTP requests to DeepIntShield:

MetricTypeDescription
http_requests_totalCounterTotal number of HTTP requests
http_request_duration_secondsHistogramDuration of HTTP requests
http_request_size_bytesHistogramSize of incoming HTTP requests
http_response_size_bytesHistogramSize of outgoing HTTP responses

Labels:

  • path: HTTP endpoint path
  • method: HTTP verb (e.g., GET, POST, PUT, DELETE)
  • status: HTTP status code
  • custom labels: Custom labels configured in the DeepIntShield configuration

These metrics track requests forwarded to AI providers:

MetricTypeDescriptionLabels
deepintshield_upstream_requests_totalCounterTotal requests forwarded to upstream providersBase Labels, custom labels
deepintshield_success_requests_totalCounterTotal successful requests to upstream providersBase Labels, custom labels
deepintshield_error_requests_totalCounterTotal failed requests to upstream providersBase Labels, reason, custom labels
deepintshield_upstream_latency_secondsHistogramLatency of upstream provider requestsBase Labels, is_success, custom labels
deepintshield_input_tokens_totalCounterTotal input tokens sent to upstream providersBase Labels, custom labels
deepintshield_output_tokens_totalCounterTotal output tokens received from upstream providersBase Labels, custom labels
deepintshield_cache_hits_totalCounterTotal cache hits by type (direct/semantic)Base Labels, cache_type, custom labels
deepintshield_cost_totalCounterTotal cost in USD for upstream provider requestsBase Labels, custom labels

Base Labels:

  • provider: AI provider name (e.g., openai, anthropic, azure)
  • model: Model name (e.g., gpt-4o-mini, claude-3-sonnet)
  • method: Request type (chat, text, embedding, speech, transcription)
  • virtual_key_id: Virtual key ID
  • virtual_key_name: Virtual key name
  • routing_engines_used: Comma-separated routing engines used (“routing-rule”, “governance”, “loadbalancing”)
  • routing_rule_id: Routing rule ID that matched the request
  • routing_rule_name: Routing rule name that matched the request
  • selected_key_id: Selected key ID
  • selected_key_name: Selected key name
  • number_of_retries: Number of retries
  • fallback_index: Fallback index (0 for first attempt, 1 for second attempt, etc.)
  • custom labels: Custom labels configured in the DeepIntShield configuration

These metrics capture latency characteristics specific to streaming responses:

MetricTypeDescriptionLabels
deepintshield_stream_first_token_latency_secondsHistogramTime from request start to first streamed tokenBase Labels
deepintshield_stream_inter_token_latency_secondsHistogramLatency between subsequent streamed tokensBase Labels

Track the success rate of requests to different providers:

# Success rate by provider
rate(deepintshield_success_requests_total[5m]) /
rate(deepintshield_upstream_requests_total[5m]) * 100

Monitor token consumption across different models:

# Input tokens per minute by model
increase(deepintshield_input_tokens_total[1m])
# Output tokens per minute by model
increase(deepintshield_output_tokens_total[1m])
# Token efficiency (output/input ratio)
rate(deepintshield_output_tokens_total[5m]) /
rate(deepintshield_input_tokens_total[5m])

Monitor spending across providers and models:

# Cost per second by provider
sum by (provider) (rate(deepintshield_cost_total[1m]))
# Daily cost estimate
sum by (provider) (increase(deepintshield_cost_total[1d]))
# Cost per request by provider and model
sum by (provider, model) (rate(deepintshield_cost_total[5m])) /
sum by (provider, model) (rate(deepintshield_upstream_requests_total[5m]))

Track cache effectiveness:

# Cache hit rate by type
rate(deepintshield_cache_hits_total[5m]) /
rate(deepintshield_upstream_requests_total[5m]) * 100
# Direct vs semantic cache hits
sum by (cache_type) (rate(deepintshield_cache_hits_total[5m]))

Monitor error patterns:

# Error rate by provider
rate(deepintshield_error_requests_total[5m]) /
rate(deepintshield_upstream_requests_total[5m]) * 100
# Errors by model
sum by (model) (rate(deepintshield_error_requests_total[5m]))

Configure custom Prometheus labels to add dimensions for filtering and analysis from the Web UI:

Prometheus Labels

  1. Navigate to Configuration

    • Open DeepIntShield UI at https://app.deepintshield.com
    • Go to Config tab
  2. Prometheus Labels

    Custom Labels: team, environment, organization, project

Add custom label values at runtime using x-bf-prom-* headers:

Terminal window
# Add custom labels to specific requests
curl -X POST https://app.deepintshield.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "x-bf-vk: sk-bf-your-virtual-key" \
-H "x-bf-prom-team: engineering" \
-H "x-bf-prom-environment: production" \
-H "x-bf-prom-organization: my-org" \
-H "x-bf-prom-project: my-project" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello!"}]
}'

Header Format:

  • Prefix: x-bf-prom-
  • Label name: Any string after the prefix
  • Value: String value for the label

To explore the metrics locally, run Prometheus and Grafana alongside your Gateway with a minimal Docker Compose stack, then point Prometheus at the Gateway’s /metrics endpoint:

docker-compose.yml
services:
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
GF_SECURITY_ADMIN_PASSWORD: admin
Terminal window
docker compose up -d
# Access endpoints
# Prometheus: http://<your-prometheus-host>:9090
# Grafana: http://<your-grafana-host>:3000 (admin/admin)
# DeepIntShield metrics: https://app.deepintshield.com/metrics

You can use the Prometheus scraping endpoint to build your own Grafana dashboards. A few examples are shown below.

Grafana Dashboard

For production environments:

  1. Deploy Prometheus with proper persistence, retention, and security
  2. Configure scraping to target your DeepIntShield instances at /metrics
  3. Set up Grafana with authentication and dashboards
  4. Configure alerts based on your SLA requirements

Prometheus Scrape Configuration:

scrape_configs:
- job_name: "deepintshield-gateway"
static_configs:
- targets: ["deepintshield-instance-1:8080", "deepintshield-instance-2:8080"]
scrape_interval: 30s
metrics_path: /metrics
# If DeepIntShield auth is enabled, add:
# basic_auth:
# username: '<admin_username>'
# password: '<admin_password>'

Configure alerts for critical scenarios using the new metrics:

High Error Rate Alert:

- alert: DeepIntShieldHighErrorRate
expr: sum by (provider) (rate(deepintshield_error_requests_total[5m])) / sum by (provider) (rate(deepintshield_upstream_requests_total[5m])) > 0.05
for: 2m
labels:
severity: warning
annotations:
summary: "High error rate detected for provider {{ $labels.provider }} ({{ $value | humanizePercentage }})"

High Cost Alert:

- alert: DeepIntShieldHighCosts
expr: sum by (provider) (increase(deepintshield_cost_total[1d])) > 100 # $100/day threshold
for: 10m
labels:
severity: warning
annotations:
summary: "Daily cost for provider {{ $labels.provider }} exceeds $100 ({{ $value | printf \"%.2f\" }})"

Cache Performance Alert:

- alert: DeepIntShieldLowCacheHitRate
expr: sum by (provider) (rate(deepintshield_cache_hits_total[15m])) / sum by (provider) (rate(deepintshield_upstream_requests_total[15m])) < 0.1
for: 5m
labels:
severity: info
annotations:
summary: "Cache hit rate for provider {{ $labels.provider }} below 10% ({{ $value | humanizePercentage }})"