When building production systems on AWS, routing is often treated as a solved problem—put an ALB in front, optionally add Global Accelerator, and move on. In practice, once traffic scales and latency becomes visible, routing decisions start to have non-trivial impact on both performance and failure behavior.
This post shares practical observations from operating real workloads, focusing on how Application Load Balancer (ALB) and Global Accelerator behave under production conditions.
1. The Problem: Routing Is Not Just About Reachability
A typical architecture looks like:
Client → Edge / CDN → AWS entry point → ALB → Services
At small scale, this works well. At scale, several issues emerge:
- Latency variance across regions
- Failover behavior is not as immediate as expected
- Additional routing layers introduce hidden overhead
What matters is not just whether traffic reaches your service, but how predictably and efficiently it does so.
2. ALB: Reliable, but Region-Bound
Application Load Balancer works well as a regional entry point:
- Native integration with ECS / EKS
- Flexible routing (path, host, header-based)
- Stable under high concurrency
However, two limitations become apparent in production:
- Region-bound latency: cross-region users always pay full RTT
- Limited global traffic control: not designed for global ingress optimization
In practice, ALB is highly reliable—but it is not sufficient on its own for global performance optimization.
3. Global Accelerator: Improves Path, Not Just Distance
Global Accelerator is often misunderstood as “just another entry point.” Its real value lies in improving network path quality:
- Traffic enters AWS backbone earlier
- TCP paths become more stable
- Reduced dependence on public internet routing conditions
In production, this leads to:
- Lower latency variance (not just lower averages)
- More stable tail latency (p95/p99)
- Faster convergence during partial failures
GA + Private ALB Pattern
One pattern that proved particularly effective is placing Global Accelerator in front of private ALBs:
- ALB is not publicly exposed
- GA provides static anycast IPs as the only ingress
- Traffic enters AWS backbone as early as possible
This setup improves:
- Security posture (no public ALB exposure)
- Network stability (less reliance on internet routing)
- Operational control (clear ingress boundary)
In practice, this combination delivers more consistent latency and cleaner traffic management compared to exposing ALB directly.
However, Global Accelerator does not eliminate:
- Backend cold starts
- ALB-level routing delays
- Application-level bottlenecks
It improves the path—not the full request lifecycle.
4. The Hidden Cost of Routing Layers
A common production architecture:
Client → Edge → GA → ALB → Service
Each layer introduces:
- TLS termination or reuse
- Header transformations
- Queueing under load
Individually small, but collectively measurable.
In one real-world scenario, removing a redundant routing hop resulted in:
- ~8–15 ms reduction in median latency
- Significant improvement in p99 latency stability
Key takeaway:
Routing layers should be explicitly justified—not added by default.
5. Failover: Expectations vs Reality
Failover behavior is often overestimated:
- ALB health checks are fast, but scoped locally
- GA reacts faster than DNS, but still not instantaneous
- Application-level failures do not always trigger infrastructure failover
Observed patterns:
- Partial degradation is more common than full outages
- Traffic may still reach unhealthy targets temporarily
- Recovery paths are often slower than failure detection
This leads to an important shift:
Designing for graceful degradation is often more effective than relying purely on failover.
6. Observability Gaps
One of the hardest problems in multi-layer routing is visibility:
- Edge metrics and AWS metrics are not aligned
- ALB logs are delayed and incomplete
- End-to-end tracing across layers is difficult
This creates a situation where:
- Latency issues are visible to users
- But difficult to attribute to a specific layer
In practice, teams need to:
- Correlate metrics across systems manually
- Build synthetic probes for critical paths
- Treat routing as a first-class observable component
7. Key Takeaways
From a production perspective:
- ALB is reliable but region-scoped
- Global Accelerator improves consistency more than raw speed
- Routing layers accumulate measurable cost
- Failover is not a silver bullet
- Observability is often the limiting factor
The main shift is:
Treat routing as part of your application architecture—not just infrastructure plumbing.
Closing Thoughts
At scale, routing is no longer a background concern—it directly impacts user experience, system stability, and operational complexity.
Understanding how AWS components behave under real-world conditions allows builders to design systems that are not just functional, but predictable and resilient.