The previous post introduced the scalability requirements of VRF on packet processing systems which are designed using a multicore packet processor. I’ll use this post to review 6WIND’s implementation of VRF.
The fast path / Data Plane modules (IPsec, Forwarding, Firewall, NAT) use tables (trie, hash tables, etc.) in order to process packets based on their contents. For every table, we have added an index (VRF) which is used for all the lookups and for every packet descriptor, we have added a VRF field. So, a lookup becomes simpler: it runs according to the packet’s descriptor index. Since the resources are shared between the VRFs and because they are not pre- or over-provisioned, these modules can support any number of VRs up to the limit allowed by the available system memory. Neither hardware threads nor virtualization techniques limit this implementation.
The 6WINDGate Networking Stack for Linux comprises a standard Linux kernel stack as well as some additional features implemented by 6WIND. With the latest Linux kernel, a per data structure field, à la namespace was added in order to support multiple instances of kernel services to the userland applications. Since our implementation of VRF is able to run a lookup based on an ID, our engineers have translated the VRF support to the same hooks that those of namespace. However, the main differences with the usual Linux namespaces are:
- (a) support of multiple daemons able to configure tables (IPsec SADs, SPDs, Forwarding tables, etc) for any namespace (VRFs). The Linux userland applications update the kernel tables using Netlink sockets, therefore 6WINDGate integrates some extensions for Netlink messages in order to associate the messages to their VRF.
- (b) support of multiple sockets of a single daemon able to send and to receive packets from any VRFs. This means that a sendmsg() or a rcvmsg() and the Linux BSD socket API of the kernel have been extended in order to support some binding with some VRs (in our case, we are using setsockopt() that were first introduced with FreeBSD 4 patches; in case of datagram sockets, we allow the association of the VRF ID into the ancillary data of the cmsg(3)).
- (c) support multiple VRF per namespace.
Quick note: for both, the fast path and the Networking Stack, supporting the “cross VR” feature becomes straightforward since the software has just to update the VRF field of the packet descriptor (aka change the vrf id of the Linux skbuff).
The 6WINDGate userland applications that are VRF-aware have been designed as a single daemon that binds to multiple VRs. For instance, the Quagga/zebra daemon can configure routes on any VRs using the Netlink extension (a) and the ospfd daemon can listen to and can send OSPF packets on any VRs (b). Then, when it becomes necessary to support thousands of VRFs, it is clear that such a system can scale better because it limits the OS overhead incurred by running thousands of instances of the same daemon on SMP Linux: this same Linux is running only on a few tens of CPU cores.
As a follow-up, the “icing on this cake” occurs when you combine VRF with HA capabilities: if a control plane is getting to much load, 6WINDGate VRF and HA can be used in order to distribute independent VRs on multiple Control Plane CPUs. Please, read Yann’s post for more information about HA.
In conclusion, even on a multicore CPU, the cores (or HW threads) do not help to increase the support of VRF. This feature must be implemented into every software module for every layer: control plane, networking stack and fast path.