by Russ White, Verisign
The field of network and protocol engineering has three watchwords: faster, bigger, and cheaper. Although we all know the joke about choosing two out of the three, the reality of networking is that we have been doing all three for years—and it doesn’t look like there is any time on the horizon when we will not be doing all three.
In that spirit, IP Fast Reroute addresses all three of these watchwords. Fast—you are probably thinking—is obvious, but what about bigger and cheaper? Fast Reroute provides the network designer with some trade-offs in the space of redundancy through additional backup links against deploying protocol changes, and network stretch against the size of a failure domain, so you can—in theory—build larger, less- redundant failure domains with Fast Reroute than without.
But to understand these effects, we need to go behind the curtain, understanding Fast Reroute as more than a few configuration options. This article first looks at the motivation behind IP Fast Reroute, and then discusses four different techniques, or stages, in the Fast Reroute story.
What Is Your Motivation?
To really discuss network speed, we need to be able to define how fast “fast” really is. In the 1980s, a network was fast if it could converge in 90 seconds or less (the longest time the Routing Information Protocol [RIP] could take to converge). As we moved into more advanced Distance-Vector and Link State protocols (Enhanced Interior Gateway Routing Protocol [EIGRP], Open Shortest Path First [OSPF], and Intermediate System-to-Intermediate System [IS-IS]), 5-second convergence became the norm. We learned to tweak timers to get to convergence times faster than 1 second.
But what if we need convergence that is faster than less than 1 second? What if we need to converge so fast that the only packets lost are either in flight or in a buffer waiting to be serialized onto the link? And what if we need to be able to handle a large number of prefixes with minimal network disruption due to link or device failures?
IP Fast Reroute techniques come into play in this situation.
Preinstalled Backup Paths
Although it is often sold as a Fast Reroute technique, preinstalled backup paths really are not; rather they support other Fast Reroute techniques at the protocol level. If the protocol has calculated a loop- free path that is an alternate to the current best path, this alternate path can be installed in the forwarding table so it is readily available for use in case the best path fails.
This solution does provide immediate failover at the hardware level, but the alternate path must be calculated to be installed. How is this alternate path computed?
The first mechanism available for calculating an alternate path is with Loop-Free Alternates. To understand this mechanism, we must make a short detour into graph theory (or geometry, if you prefer). Use the following network as an example:
- A is the destination.
- B’s best path is through D to A.
- G’s best path is through F to A.
What is the key to allowing B to forward traffic through C toward A if the B -> D link fails? B must know the traffic it forwards to C (for A) will not be forwarded back to B itself. How can B know C will forward the traffic to D, rather than to B itself? By examining the metric at C toward A.
In EIGRP, B knows C’s metric toward A because the routing proto- col includes this information in the update. In a link state protocol (OSPF or IS-IS), B can calculate C’s cost to A directly by running Shortest Path First from C’s perspective (given B and C share the same link state database).
Loop-free alternates are simply calculating whether any given neigh- bor will forward traffic to any particular destination back to you, or on toward the destination. If a neighbor would forward the traffic on toward the destination, then it is a loop-free alternate.
Under what conditions would C forward traffic sent from B back to B? If C is using B as its best path (or one of its best paths) toward A.
What about G? If it forwards traffic to J toward A, will J return the traffic to G itself? In this four-hop ring, there are two possible configurations:
- J is using H as its best path. In this case, traffic forwarded by G to A through J will be correctly forwarded. Note, however, that in this case H cannot use J as an alternate path toward A, because any traffic H sends to A through A will loop back to H itself.
- J is using G as its best path. In this case, J can use G as a loop-free alternate, but G cannot use J as a loop-free alternate.
No matter how you work the metrics in the four-hop ring case, there will always be at least one device that does not have a loop-free alternate path to A.
Split Horizon and Loop-Free Alternates
If the concept of loop-free alternates is difficult to understand by considering the problem in this way, another useful way to look at the problem is through the distance-vector idea of split horizon. To review, the split horizon rule states:
Do not advertise a route to a destination toward a neighbor you are using to forward traffic to that same destination.
If C is forwarding traffic toward A to B, then C will not advertise A to B, meaning B will not even know about this alternate path, preventing a loop even if B’s best path to A fails. If you always consider where a distance-vector protocol will split horizon, you will always be able to see where loop-free alternates will fail to provide an alternate path to any given destination.
Getting Around the Loops
If we want to design a system that will find every possible alternate path toward a given destination, rather than just finding those that are not normally taken out by split horizon anyway, what must we do? We need to find a way to route through a neighbor to some distant next hop without that neighbor actually forwarding the traffic back to the originating router.
To put this concept in more concrete terms, examine the following network as an example:
If G wants to use the path through J as an alternate path, then it must somehow figure out how to forward traffic to J without J returning the traffic to G itself. How can this process be done? G can tunnel the traffic through J to some device somewhere beyond J; therefore, every mechanism beyond loop-free alternates must use some form of tunneling to resolve the Fast Reroute problem. Calculating the point to which G needs to tunnel is the topic of the remaining mechanisms.
Even though we might be working with a link state protocol, it is easiest to understand Not-via in terms of a distance-vector protocol and split horizon. Not-via essentially begins with the observation that G does not have an alternate path to A through J in this case because J will not advertise such a route. J is, in fact, using G as its best path toward A, so the path from G through J to A cannot be viable.
The solution is not just simply having J advertise the route to A because traffic forwarded by G toward A through J will simply be looped back to G itself. So what is the solution?
In the case of Not-via, F advertises a route to itself through H only (not through G). This route will be advertised through H, then J, and finally to G. When G receives this route, it can determine that this path is an alternate path to A because its best path to A is normally through F. Any path that can reach F not through (or not via) its best path to F must, necessarily, be a loop-free alternate path to F. To reach A through F, however, G must tunnel to F directly, thereby avoiding the problem of J returning traffic destined to A back to G.
The address F advertises through H only is called “F Not-via G,” and that is why this system is called “Not-via.” This mechanism works in every topology (so long as an alternate path exists). The one downside to Not-via is that for each protected link or node, a new advertisement must be built and advertised through the network.
The problem of finding a next hop that passes over the split-horizon point can also be solved using the ability to form multiple disjoint topologies—multiple topologies that do not share the same links (or nodes, in some cases) to reach the same set of destinations. If this information sounds complex, that is because it is complex; a lot of hours and thought have gone into various systems to build and use multiple disjoint topologies within a single physical network. But there is a moderately simple way, referring back to Figure 2. In this network, G can take the following steps:
- Remove the G -> F link from its local database temporarily (just for this calculation).
- Calculate the best path to F.
- If an alternate path to F exists, mark this alternate path as a second topology.
- 4. If its path to F fails, place all traffic that would normally pass across
G -> F on this alternate topology.
It might not be obvious from this set of actions, but these actions will actually cause G to discover that it is, in fact, on a ring, and that it can place traffic on the opposite direction on this ring to get traffic to the same destination. Placing the traffic it would normally send to F via G -> F on a separate topology overcomes the forwarding table at J, a process that would loop the traffic back to G itself. You could use a tunnel to F instead of a separate topology; tunnels are, in effect, a disjoint topology seen in a different way.
What advantage does IP Fast Reroute provide the network designer? The ability to reduce the amount of physical redundancy while maintaining the same actual level of redundancy in the network. Moving to Not-via or disjoint topology solutions removes the need to manually manage link costs as well, while adding only moderate complexity at the protocol level.
IP Fast Reroute is an interesting technology just on the edge of adoption that will be useful in campus, data center (through Layer 2 routing), and standard Layer 3 network designs.
For Further Reading
Work is currently active on the disjoint topology mechanism within the research community and the IETF; in particular, the following drafts will be of interest to anyone who wants to learn more:
|||Alia Atlas, Robert Kebler, Maciek Konstantynowicz, Andras Csaszar, Russ White, and Mike Shand, “An Architecture for IP/ LDP Fast-Reroute Using Maximally Redundant Trees,” Internet Draft, work in progress, October 2011, draft-atlas-rtgwg-mrt-frr-architecture-01|
|||Alia Atlas, Gabor Envedi, and Andras Csaszar, “Algorithms for Computing Maximally Redundant Trees for IP/LDP Fast- Reroute,” Internet Draft, work in progress, March 2012, draft-enyedi-rtgwg-mrt-frr-algorithm-01|
|||Stefano Previdi, Mike Shand, and Stewart Bryant, “IP Fast Reroute Using Not-via Addresses,” Internet Draft, work in progress, December 2011, draft-ietf-rtgwg-ipfrr-notvia-addresses-08|
|||Clarence Filsfils and Pierre Francois, “LFA applicability in SP networks,” Internet Draft, work in progress, January 2012, draft-ietf-rtgwg-lfa-applicability-06|
RUSS WHITE is a Principle Research Engineer at Verisign. He has co-authored numerous technical books, RFCs, and software patents. He focuses primarily on network complexity, network design, the space where routing and naming intersect, control-plane security, protocol design, protocol operation, and software-defined networks.