Distributed Shared Memory and Message Passing with Gigabit Networks

Matthias Ackermann

Semester Thesis Winter 1999
Supervisors: Chr. Kurmann, Prof. T. Stricker
Institute for Computer Systems, ETH Zürich


What is the theme? What is the motivation? What are the goals? Which problems have to be solved?

Network technologies like Gigabit Ethernet have theoretical band widths of 120 MB/s and more. But common computers and operation systems cannot deal with such datastreams. Copy operations over the memory bus limit the band width to 45 MB/s and fewer. The goal of this thesis was to find ways to eliminate the need of copy operations over the memory bus. This is commonly called Zero Copy.

Roman Roth presents in his diploma thesis a solution of this theme for Windows NT 4. His final implementation consists of a protocol layer that provides zero copy functionality (ZeroCopy Layer) to applications and bases on a network abstraction layer (Network Layer). Based on the ZeroCopy layer the thesis also contains an MPI Protype. The ZeroCopy Layer and the MPI Prototype were ported from Windows NT to Linux (kernel version 2.2).

The main and most time consuming problem was to create a network abstraction layer for the socket interface of Linux. The Windows NT implementation of the network abstraction layer used the asynchronous I/O operations of Windows (ReadFileEx, WriteFileEx and APC's). This feature of the Win32 API is missing under Linux. Several possible solutions were evaluated. E. g. multi threading, signals and non-blocking I/O operations.


What was accomplished? What are the solutions to the posed problems? What are the remaining problems?

The ZeroCopyLayer and the MPI prototype were successfully ported from Windows NT to Linux. All major problems could be solved. The thesis includes two different network abstraction layers. Both use multithreading and polling to communicate with the socket interface. One abstraction layer has a sender thread and a receiver thread, the other does without a sender thread.

This implementation should have been tested with the ZeroCopy TCP/IP stack from the master thesis of Michel Müller. But the driver proposed in his master thesis was very instable so the tests could only be made with the standard TCP/IP stack of Linux. And this stack has a maximum bandwidth of 47 MB/s.

The measurements for the ZeroCopyLayer showed band widths of 41 MB/s for GigabitEthernet and 10.5 MB/s for FastEthernet. The MPI implementation reaches 40 MB/s with GigabitEthernet and 10 MB/s with FastEthernet. This is equivalent to 90% - 95% of the possible band width. The Windows NT implementation had maximum band witdth of 10 MB/s also for GigabitEthernet.

High band widths could only be reached with packet sizes over 500 KB (for GigabitEthernet). Packets under 500 KB were delayed by several effect introduced by the use of multithreading. To reach better performance with small packets maybe a signals or a kernel module should be used instead of multithreading.

[ CS-Department | Up ]

ETH Zürich: Department of Computer Science
Comments to Jacques Supcik <supcik@inf.ethz.ch>
March 16th, 2000