t3.medium
Instance Configuration
Section titled “Instance Configuration”AWS t3.medium Specifications:
- vCPUs: 2
- Memory: 4GB RAM
- Network Performance: Up to 5 Gigabit
DeepIntShield Configuration:
- Buffer Size: 15,000
- Initial Pool Size: 10,000
- Test Load: 5,000 requests per second (RPS)
Performance Results
Section titled “Performance Results”Overall Performance Metrics
Section titled “Overall Performance Metrics”| Metric | Value | Notes |
|---|---|---|
| Success Rate | 100.00% | Perfect reliability under high load |
| Average Request Size | 0.13 KB | Lightweight request payload |
| Average Response Size | 1.37 KB | Standard response size for testing |
| Average Latency | 2.12s | Total end-to-end response time |
| Peak Memory Usage | 1,312.79 MB | ~33% of available 4GB RAM |
Where the time goes
Section titled “Where the time goes”Almost all end-to-end latency is the upstream provider API call - the gateway itself adds only microseconds.
| Component | Latency | Notes |
|---|---|---|
| Upstream provider call | 1.56s | The actual model API request (unavoidable in any setup) |
| DeepIntShield overhead | 59 µs | Added latency from the gateway |
DeepIntShield’s Total Overhead: 59 µs*
*Excludes the provider API call and JSON serialization, which are required in any implementation
Performance Analysis
Section titled “Performance Analysis”Strengths on t3.medium
Section titled “Strengths on t3.medium”- Perfect Reliability: 100% success rate even at 5,000 RPS
- Memory Efficiency: Uses only 33% of available RAM (1,312.79 MB / 4GB)
- Minimal Overhead: Just 59 µs of added latency per request
- Cost-effective: Sustains a 5,000 RPS workload on a low-cost instance
Resource Utilization
Section titled “Resource Utilization”- Memory Usage: Very efficient at 1,312.79 MB peak usage
- CPU Performance: Handles 5,000 RPS workload effectively
- Stability: No failed requests or throughput degradation under sustained load
Configuration Recommendations
Section titled “Configuration Recommendations”Optimal Settings for t3.medium
Section titled “Optimal Settings for t3.medium”Based on test results, these configurations work well:
{ "client": { "initial_pool_size": 10000, "buffer_size": 15000 }}Tuning Opportunities
Section titled “Tuning Opportunities”For Lower Memory Usage:
- Reduce
initial_pool_sizeto 7,500-8,000 - Decrease
buffer_sizeto 12,000-13,000 - Trade-off: Slightly higher latency
For Better Performance:
- Increase
initial_pool_sizeto 12,000-13,000 - Increase
buffer_sizeto 17,000-18,000 - Trade-off: Higher memory usage (monitor RAM limits)
Comparison Context
Section titled “Comparison Context”vs. t3.xlarge Performance
Section titled “vs. t3.xlarge Performance”| Metric | t3.medium | t3.xlarge | Difference |
|---|---|---|---|
| DeepIntShield Overhead | 59 µs | 11 µs | +81% slower |
| Average Latency | 2.12s | 1.61s | +24% slower |
| Memory Usage | 1,312.79 MB | 3,340.44 MB | -61% usage |
Key Insights:
- t3.medium uses 61% less memory than t3.xlarge
- Performance trade-offs are reasonable for cost savings
- Most operations still complete in microseconds
Next Steps
Section titled “Next Steps”When to upgrade to t3.xlarge:
-
Sustained load approaches 4,000+ RPS
-
Average latency climbs or success rate drops below 100%
-
Memory usage approaches 75% of available RAM
-
Run Your Own Benchmarks to test with your specific workload
-
Compare with t3.xlarge for performance scaling analysis