Why Standard OS Networking Stacks like Linux’s Are Not Well Adapted to Multicore Packet Processing

By Vincent Jardin – 6WIND CTO

It is quite common to move userland applications to the kernel of the OS in order to get higher performance since the applications are getting closer to the networking stack and to the kernel drivers.

With the introduction of multicore CPUs, there is an additional trend to use some threads or tasklets in the userland or the kernel in order to try reaching higher performance.
Most of the time, this design approach is disappointing and the benchmarks show it does not scale; with a N-core processor, performance does not reach the performance of N processors.

Why?

An OS networking stack uses OS services and inherits from its limitations such as pre-emptions, threads, timers, locking, etc. It leads to performance bottlenecks like L1/L2 CPU cash misses, pipeline mis-predictions, and a lack of scalability due to the locks. Moreover, it also brings additional development complexities such as the usage of locks into the kernel or RCU (Read Copy Update) with some recent Linux Kernel 2.6.

Because of this overhead, it’ll never be possible to get an OS stack strictly linearly scalable according to the number of CPU cores and the increasing development complexities will slow down the time to migrate the networking applications on those multicore CPUs.

So, the following question can be raised: should we migrate the OS stack into a Fast Path (BSD or Linux into a Bare Metal) or should we rewrite it from the ground? Migrating an existing stack keeps the OS limitations as only the overhead of the scheduling the Kernel is removed.

Any attempt to migrate the OS stack into a Bare Metal Environment will not scale when all the use cases will be enabled with the stack.

You need to think and design it differently when you want to get high performance networking on multicore CPUs. The OS stack MUST be part of the solution since they bring some APIs, but some alternative designs are required in order to bring the performances.

More information about 6WINDGate architecture can be found here.

You can download more detailed documents here.

You can check 6WINDGate FAQ here.

Post a Reply