t3.xlarge

Instance Configuration

AWS t3.xlarge Specifications:

vCPUs: 4
Memory: 16GB RAM
Network Performance: Up to 5 Gigabit

DeepIntShield Configuration:

Buffer Size: 20,000
Initial Pool Size: 15,000
Test Load: 5,000 requests per second (RPS)

Performance Results

Overall Performance Metrics

Metric	Value	Notes
Success Rate	100.00%	Perfect reliability under high load
Average Request Size	0.13 KB	Lightweight request payload
Average Response Size	10.32 KB	Large response payload testing
Average Latency	1.61s	Total end-to-end response time
Peak Memory Usage	3,340.44 MB	~21% of available 16GB RAM

Note: t3.xlarge tests used significantly larger response payloads (~10 KB vs ~1 KB on t3.medium) to stress-test performance with realistic production data sizes.

Where the time goes

As on t3.medium, end-to-end latency is dominated by the upstream provider call - the gateway adds only microseconds, and even less than on the smaller instance.

Component	Latency	Notes
Upstream provider call	1.50s	The actual model API request (unavoidable in any setup)
DeepIntShield overhead	11 µs	81% lower than t3.medium (59 µs → 11 µs)

DeepIntShield’s Total Overhead: 11 µs*

*Excludes the provider API call and JSON serialization, which are required in any implementation.

Performance Analysis

Exceptional Performance Improvements

Dramatic Overhead Reduction: 81% lower DeepIntShield overhead (59 µs → 11 µs)
Lower Average Latency: 24% faster end-to-end response time (2.12s → 1.61s)
Handles Larger Responses: Maintains performance with 7.5x larger response payloads
Perfect Reliability: 100% success rate maintained under high load

Resource Utilization

Memory Efficiency: Uses only 21% of available RAM (3,340.44 MB / 16GB)
CPU Performance: Excellent multi-core utilization for 5,000 RPS
Headroom: Substantial capacity for traffic spikes and growth

Scalability and Headroom

Exceptional Scaling Characteristics

The t3.xlarge configuration demonstrates excellent scaling potential:

Current Utilization:

Memory: 21% used (13GB available headroom)
Latency: 1.61s end-to-end, dominated by the provider call
Gateway overhead: 11 µs per request, leaving ample CPU headroom

Scaling Potential:

Traffic Spikes: Can likely handle 15,000+ RPS bursts
Response Size Growth: Efficiently handles 10 KB responses
Concurrent Users: Supports thousands of simultaneous users

Advanced Configuration

Optimal Settings for t3.xlarge

Based on test results, these configurations provide excellent performance:

{
  "client": {
    "initial_pool_size": 15000,
    "buffer_size": 20000
  }
}

Performance Tuning Opportunities

For Maximum Performance:

Increase initial_pool_size to 18,000-20,000
Increase buffer_size to 25,000-30,000
Trade-off: Higher memory usage (still well within limits)

For Memory Optimization:

Current config already very efficient at 21% RAM usage
Could reduce settings if needed, but performance gains would be lost

For Extreme Workloads:

Consider initial_pool_size up to 25,000
Increase buffer_size to 35,000+
Monitor memory usage approaching 50% of available RAM

Performance Comparison

vs. t3.medium Performance

Metric	t3.medium	t3.xlarge	Improvement
DeepIntShield Overhead	59 µs	11 µs	-81%
Average Latency	2.12s	1.61s	-24%
Response Size Handled	1.37 KB	10.32 KB	+7.5x
Peak Memory Usage	1,312.79 MB	3,340.44 MB	+155%
Memory Utilization	33%	21%	-36%

Key Insights:

81% overhead reduction while handling 7.5x larger responses
Exceptional efficiency with only 21% memory utilization
Lower average latency despite much larger payloads
Substantial headroom for growth and traffic spikes

Next Steps

Run Your Own Benchmarks with your specific payload sizes
Compare with t3.medium for cost-optimization analysis