Designing Low-Latency Routing for Production Workloads on AWS

When building production systems on AWS, routing is often treated as a solved problem—put an ALB in front, optionally add Global Accelerator, and move on. In practice, once traffic scales and latency becomes visible, routing decisions start to have non-trivial impact on both performance and failure behavior.

This post shares practical observations from operating real workloads, focusing on how Application Load Balancer (ALB) and Global Accelerator behave under production conditions.

1. The Problem: Routing Is Not Just About Reachability

A typical architecture looks like:

Client → Edge / CDN → AWS entry point → ALB → Services

At small scale, this works well. At scale, several issues emerge:

Latency variance across regions
Failover behavior is not as immediate as expected
Additional routing layers introduce hidden overhead

What matters is not just whether traffic reaches your service, but how predictably and efficiently it does so.

2. ALB: Reliable, but Region-Bound

Application Load Balancer works well as a regional entry point:

Native integration with ECS / EKS
Flexible routing (path, host, header-based)
Stable under high concurrency

However, two limitations become apparent in production:

Region-bound latency: cross-region users always pay full RTT
Limited global traffic control: not designed for global ingress optimization

In practice, ALB is highly reliable—but it is not sufficient on its own for global performance optimization.

3. Global Accelerator: Improves Path, Not Just Distance

Global Accelerator is often misunderstood as “just another entry point.” Its real value lies in improving network path quality:

Traffic enters AWS backbone earlier
TCP paths become more stable
Reduced dependence on public internet routing conditions

In production, this leads to:

Lower latency variance (not just lower averages)
More stable tail latency (p95/p99)
Faster convergence during partial failures

GA + Private ALB Pattern

One pattern that proved particularly effective is placing Global Accelerator in front of private ALBs:

ALB is not publicly exposed
GA provides static anycast IPs as the only ingress
Traffic enters AWS backbone as early as possible

This setup improves:

Security posture (no public ALB exposure)
Network stability (less reliance on internet routing)
Operational control (clear ingress boundary)

In practice, this combination delivers more consistent latency and cleaner traffic management compared to exposing ALB directly.

However, Global Accelerator does not eliminate:

Backend cold starts
ALB-level routing delays
Application-level bottlenecks

It improves the path—not the full request lifecycle.

4. The Hidden Cost of Routing Layers

A common production architecture:

Client → Edge → GA → ALB → Service

Each layer introduces:

TLS termination or reuse
Header transformations
Queueing under load

Individually small, but collectively measurable.

In one real-world scenario, removing a redundant routing hop resulted in:

~8–15 ms reduction in median latency
Significant improvement in p99 latency stability

Key takeaway:

Routing layers should be explicitly justified—not added by default.

5. Failover: Expectations vs Reality

Failover behavior is often overestimated:

ALB health checks are fast, but scoped locally
GA reacts faster than DNS, but still not instantaneous
Application-level failures do not always trigger infrastructure failover

Observed patterns:

Partial degradation is more common than full outages
Traffic may still reach unhealthy targets temporarily
Recovery paths are often slower than failure detection

This leads to an important shift:

Designing for graceful degradation is often more effective than relying purely on failover.

6. Observability Gaps

One of the hardest problems in multi-layer routing is visibility:

Edge metrics and AWS metrics are not aligned
ALB logs are delayed and incomplete
End-to-end tracing across layers is difficult

This creates a situation where:

Latency issues are visible to users
But difficult to attribute to a specific layer

In practice, teams need to:

Correlate metrics across systems manually
Build synthetic probes for critical paths
Treat routing as a first-class observable component

7. Key Takeaways

From a production perspective:

ALB is reliable but region-scoped
Global Accelerator improves consistency more than raw speed
Routing layers accumulate measurable cost
Failover is not a silver bullet
Observability is often the limiting factor

The main shift is:

Treat routing as part of your application architecture—not just infrastructure plumbing.

Closing Thoughts

At scale, routing is no longer a background concern—it directly impacts user experience, system stability, and operational complexity.

Understanding how AWS components behave under real-world conditions allows builders to design systems that are not just functional, but predictable and resilient.