Improving TCP Termination Performance

By David Le Goff – 6WIND Product Marketing Manager

The growth in Web 2.0 traffic, as well as the expectation of increasingly-complex Web 3.0 applications, places major challenges on network appliances. The standard TCP transport protocol used for these communications was not initially designed for high performance network architectures where millions of (short-lived) sessions occur. As a result, we see significant growth in the adoption of WAN Optimization Controllers, SSL Offload Engines and Application Delivery Controllers. These products are required because of the performance limitations of standard TCP stacks when handling a large amount of bidirectional connections and/or when supporting typical WAN connections.

Benefits

In line with customer requests and market trends, 6WIND has leveraged its unique fast path technology to address the TCP challenges listed above by providing Network Equipment Manufacturers with a TCP stack that maximizes overall performance. Architecturally, the 6WINDGate TCP stack implements:

  • Complete lockless architecture TCP design;
  • Zero-copy between the TCP stack and the application (buffer pointer management);
  • Scheduling mechanisms that leverage hardware-specific features;
  • BSD Socket API support (fp_so_bind(), fp_so_connect(), fp_so_listen(), fp_so_accept());
  • LRO (Large Receive Offload) and TSO (TCP Segmentation Offload) splicing mechanisms;
  • Full interoperability (BSD source) with 3rd party TCP network elements;
  • Ultra-low latency compared to standard Linux networking stacks;
  • Carrier-grade class of service with a large number of TCP sessions (millions) supported.

Along with ingress and egress processing, applications using 6WINDGate benefit from its high performance TCP stack that supports millions of connections with several tens of Gbps bandwidth, as shown in the figure. More detailed benchmarks running complex TCP scenarios that include drops, congestions, retransmission and hard-reset are available on demand.

Other performance tests using HAProxy (http://haproxy.1wt.eu/) demonstrate that the 6WIND TCP stack typically improves performance up to 6x, along with providing a major improvement in session concurrency (from thousands to millions).

The figure below illustrated how a major networking equipment manufacturer uses 6WINDGate in an Application Delivery Controller (ADC),  processing millions of NAT/LSN operations per second with a TCP proxy mechanism that enables multiplexing in the wireless core network infrastructure. The vendor has achieved more than 100K NAT sessions per second and has scaled up to 18M LSN sessions using the TCP stack in parallel to deliver tens of Gbps network bandwidth.

Using TCP Termination along with other 6WINDGate modules, Networking Equipment Manufacturers can develop advanced network software nodes such as application delivery controllers, next generation firewalls and generic Layer 4 through 7 gateways while minimizing overall system costs through the use of commodity multicore platforms.

    4 Comments

  1. While running open source application such as apache webserver over 6WindGATE, do we need to replace the socket api layer with 6WindGATE specific apis in order to communicate with TCP stack?
    I presume that zero copy between 6WindGATE and application software requires that frame buffer memory between betwewen applications and 6WindGATE TCP stack to be shared.

    Typically what kind of changes are required in software when it is migrated on top of 6WindGATE?

  2. Hi Vakul,

    It is an important question you raise here. As of today, a minor update on your TCP-based application is required. In the apache webserver example, you would need to use our APIs by replacing TCP socket method calls by ours (still in the same shape with an added prefix). Also we manage some event-like mechanisms (callbacks) rather than using blocking or non blocking sockets for receiving/sending the packets.

    Thanks to this minor and simple update you benefit from a major performance improvement.

    And yes, the zero copy mechanism is enabled with a shared memory principle.

    Best,

    David.

  3. Hi David,

    Did you do any changes for HA Proxy code for achieving the above said performance?.

    As HAProxy will give better req/s when used in single instance mode. How do you configure HAProxy instances like binding them to each core etc. Tell us about HAProxy configuration used here.

  4. Hi Balaji,

    Actually we did not improve HAproxy but we benchmarked our own TCP proxy against HAproxy.
    Therefore we demonstrated an outstanding performance gain in this way. Depending on the test case configuration, we recently announced over 100Gbps TCP performance on one platform.

Post a Reply