By Yann Rapaport, 6WIND Customer Support and Service Manager
This post is the first of a series of seven that will discuss High Availability (HA) capabilities for packet processing software. The first five posts will describe architecture concepts and the two last ones will illustrate these concepts using a real-world example.
Let’s start with the analysis of the system requirements.
A multicore-based system provides huge processing capabilities. A single failure in this kind of equipment can affect a very large number of users with long service interruptions and it is absolutely unacceptable. Multicore packet processing needs to implement specific mechanisms to provide HA-ready solutions.
A High Availability architecture is based on some inactive elements that are not in operational use. The goal of the system is to replace a failing active element by an inactive one and to restore the expected level of service within the shortest period of time. Several strategies can be implemented according to the requirements for service interruption.
Once a failure has been detected, an inactive element is configured to replace the failing one. This means that the whole configuration has to be restored and complete information has to be provided from the system to the new element to restart the service. If we take routing as an example, it means that the configuration of the routing protocols has to be performed on the new element and the routing protocols have to complete the route learning process to provide the service. This could take a long time (several minutes), which is not compatible with high availability requirements.
To avoid such long interruption of service, a more sophisticated architecture can be implemented based on a “1+1” architecture. As described in the following figure, a pair of elements (one active, one inactive) is used. A process is implemented that maintains a coherent view of the system in both elements. This process synchronizes the required information between both elements. In case of a failure of the active element, the inactive one has all the information ready to provide the expected level of service within a very short period time. If we take synchronization between routing protocols as an example, the inactive element receives all the routing table updates from the active element, ensuring that it has exactly the same level of information. It should be noted that each Control Plane protocol (ARP/NDP, IPsec, NAT, firewall…) manages its own specific information. So, a dedicated synchronization mechanism is required for each of them.
Beside synchronization mechanisms, the packet processing architecture also has to provide monitoring services and graceful restart capabilities.
Monitoring services periodically check the health of the packet processing components to detect possible issues, in order to prevent complete shutdown or to anticipate switching from the active to the inactive element. These services alert the HA framework that supervises the whole system. The HA framework makes the decision to re-launch a software component or to partially / totally reboot the system.
Graceful restart provides the capability for restarting packet processing software components without interrupting the traffic. Each key software component must be able to implement those features. If the software component does not implement internal states, it only has to be stopped and restarted. More complex protocols require specific mechanisms; some of them like the OSPF routing protocol have been standardized.
Finally, it is very valuable to use the system’s redundancy to enhance the availability of the equipment. A “1+1 architecture” provides redundant interfaces so the equipment can be connected to the network architecture to provide several physical paths. If we refer to the above figure, it means that the inactive Fast Path is now active.
Each part of the architecture will be detailed in further posts.
More information about 6WINDGate architecture can be found here.
6WINDGate High Availability Architecture Overview is available here.
You can check 6WINDGate FAQ here.