t3.xlarge
Instance Configuration
Section titled “Instance Configuration”AWS t3.xlarge Specifications:
- vCPUs: 4
- Memory: 16GB RAM
- Network Performance: Up to 5 Gigabit
DeepIntShield Configuration:
- Buffer Size: 20,000
- Initial Pool Size: 15,000
- Test Load: 5,000 requests per second (RPS)
Performance Results
Section titled “Performance Results”Overall Performance Metrics
Section titled “Overall Performance Metrics”| Metric | Value | Notes |
|---|---|---|
| Success Rate | 100.00% | Perfect reliability under high load |
| Average Request Size | 0.13 KB | Lightweight request payload |
| Average Response Size | 10.32 KB | Large response payload testing |
| Average Latency | 1.61s | Total end-to-end response time |
| Peak Memory Usage | 3,340.44 MB | ~21% of available 16GB RAM |
Note: t3.xlarge tests used significantly larger response payloads (~10 KB vs ~1 KB on t3.medium) to stress-test performance with realistic production data sizes.
Where the time goes
Section titled “Where the time goes”As on t3.medium, end-to-end latency is dominated by the upstream provider call - the gateway adds only microseconds, and even less than on the smaller instance.
| Component | Latency | Notes |
|---|---|---|
| Upstream provider call | 1.50s | The actual model API request (unavoidable in any setup) |
| DeepIntShield overhead | 11 µs | 81% lower than t3.medium (59 µs → 11 µs) |
DeepIntShield’s Total Overhead: 11 µs*
*Excludes the provider API call and JSON serialization, which are required in any implementation.
Performance Analysis
Section titled “Performance Analysis”Exceptional Performance Improvements
Section titled “Exceptional Performance Improvements”- Dramatic Overhead Reduction: 81% lower DeepIntShield overhead (59 µs → 11 µs)
- Lower Average Latency: 24% faster end-to-end response time (2.12s → 1.61s)
- Handles Larger Responses: Maintains performance with 7.5x larger response payloads
- Perfect Reliability: 100% success rate maintained under high load
Resource Utilization
Section titled “Resource Utilization”- Memory Efficiency: Uses only 21% of available RAM (3,340.44 MB / 16GB)
- CPU Performance: Excellent multi-core utilization for 5,000 RPS
- Headroom: Substantial capacity for traffic spikes and growth
Scalability and Headroom
Section titled “Scalability and Headroom”Exceptional Scaling Characteristics
Section titled “Exceptional Scaling Characteristics”The t3.xlarge configuration demonstrates excellent scaling potential:
Current Utilization:
- Memory: 21% used (13GB available headroom)
- Latency: 1.61s end-to-end, dominated by the provider call
- Gateway overhead: 11 µs per request, leaving ample CPU headroom
Scaling Potential:
- Traffic Spikes: Can likely handle 15,000+ RPS bursts
- Response Size Growth: Efficiently handles 10 KB responses
- Concurrent Users: Supports thousands of simultaneous users
Advanced Configuration
Section titled “Advanced Configuration”Optimal Settings for t3.xlarge
Section titled “Optimal Settings for t3.xlarge”Based on test results, these configurations provide excellent performance:
{ "client": { "initial_pool_size": 15000, "buffer_size": 20000 }}Performance Tuning Opportunities
Section titled “Performance Tuning Opportunities”For Maximum Performance:
- Increase
initial_pool_sizeto 18,000-20,000 - Increase
buffer_sizeto 25,000-30,000 - Trade-off: Higher memory usage (still well within limits)
For Memory Optimization:
- Current config already very efficient at 21% RAM usage
- Could reduce settings if needed, but performance gains would be lost
For Extreme Workloads:
- Consider
initial_pool_sizeup to 25,000 - Increase
buffer_sizeto 35,000+ - Monitor memory usage approaching 50% of available RAM
Performance Comparison
Section titled “Performance Comparison”vs. t3.medium Performance
Section titled “vs. t3.medium Performance”| Metric | t3.medium | t3.xlarge | Improvement |
|---|---|---|---|
| DeepIntShield Overhead | 59 µs | 11 µs | -81% |
| Average Latency | 2.12s | 1.61s | -24% |
| Response Size Handled | 1.37 KB | 10.32 KB | +7.5x |
| Peak Memory Usage | 1,312.79 MB | 3,340.44 MB | +155% |
| Memory Utilization | 33% | 21% | -36% |
Key Insights:
- 81% overhead reduction while handling 7.5x larger responses
- Exceptional efficiency with only 21% memory utilization
- Lower average latency despite much larger payloads
- Substantial headroom for growth and traffic spikes
Next Steps
Section titled “Next Steps”- Run Your Own Benchmarks with your specific payload sizes
- Compare with t3.medium for cost-optimization analysis