Skip to content

t3.xlarge

AWS t3.xlarge Specifications:

  • vCPUs: 4
  • Memory: 16GB RAM
  • Network Performance: Up to 5 Gigabit

DeepIntShield Configuration:

  • Buffer Size: 20,000
  • Initial Pool Size: 15,000
  • Test Load: 5,000 requests per second (RPS)

MetricValueNotes
Success Rate100.00%Perfect reliability under high load
Average Request Size0.13 KBLightweight request payload
Average Response Size10.32 KBLarge response payload testing
Average Latency1.61sTotal end-to-end response time
Peak Memory Usage3,340.44 MB~21% of available 16GB RAM

Note: t3.xlarge tests used significantly larger response payloads (~10 KB vs ~1 KB on t3.medium) to stress-test performance with realistic production data sizes.

As on t3.medium, end-to-end latency is dominated by the upstream provider call - the gateway adds only microseconds, and even less than on the smaller instance.

ComponentLatencyNotes
Upstream provider call1.50sThe actual model API request (unavoidable in any setup)
DeepIntShield overhead11 µs81% lower than t3.medium (59 µs → 11 µs)

DeepIntShield’s Total Overhead: 11 µs*

*Excludes the provider API call and JSON serialization, which are required in any implementation.


  1. Dramatic Overhead Reduction: 81% lower DeepIntShield overhead (59 µs → 11 µs)
  2. Lower Average Latency: 24% faster end-to-end response time (2.12s → 1.61s)
  3. Handles Larger Responses: Maintains performance with 7.5x larger response payloads
  4. Perfect Reliability: 100% success rate maintained under high load
  • Memory Efficiency: Uses only 21% of available RAM (3,340.44 MB / 16GB)
  • CPU Performance: Excellent multi-core utilization for 5,000 RPS
  • Headroom: Substantial capacity for traffic spikes and growth

The t3.xlarge configuration demonstrates excellent scaling potential:

Current Utilization:

  • Memory: 21% used (13GB available headroom)
  • Latency: 1.61s end-to-end, dominated by the provider call
  • Gateway overhead: 11 µs per request, leaving ample CPU headroom

Scaling Potential:

  • Traffic Spikes: Can likely handle 15,000+ RPS bursts
  • Response Size Growth: Efficiently handles 10 KB responses
  • Concurrent Users: Supports thousands of simultaneous users

Based on test results, these configurations provide excellent performance:

{
"client": {
"initial_pool_size": 15000,
"buffer_size": 20000
}
}

For Maximum Performance:

  • Increase initial_pool_size to 18,000-20,000
  • Increase buffer_size to 25,000-30,000
  • Trade-off: Higher memory usage (still well within limits)

For Memory Optimization:

  • Current config already very efficient at 21% RAM usage
  • Could reduce settings if needed, but performance gains would be lost

For Extreme Workloads:

  • Consider initial_pool_size up to 25,000
  • Increase buffer_size to 35,000+
  • Monitor memory usage approaching 50% of available RAM

Metrict3.mediumt3.xlargeImprovement
DeepIntShield Overhead59 µs11 µs-81%
Average Latency2.12s1.61s-24%
Response Size Handled1.37 KB10.32 KB+7.5x
Peak Memory Usage1,312.79 MB3,340.44 MB+155%
Memory Utilization33%21%-36%

Key Insights:

  • 81% overhead reduction while handling 7.5x larger responses
  • Exceptional efficiency with only 21% memory utilization
  • Lower average latency despite much larger payloads
  • Substantial headroom for growth and traffic spikes