The most important problem is that data is copied in the portable messaging libraries or in the standard communication protocol stacks. One solution is the implementation of "zero-copy" in communication system software.
The implementation of a communication system with
a zero-copy layer relies on some low level drivers for access to the hardware
and some higher level software services (e.g. collective communication).
Traditional UNIX I/O interfaces are based on copy semantics, where read and write calls transfer data between kernel and user-defined buffers.
Data touching overheads include those operations that require processig of the data within a given buffer such as checksumming or copying from one buffer to another.There are some efforts for reducing data-touching overheads and they include protocol integrated layer processing, high-performance network adapters (by careful design) to eliminate copying between devices and the OS kernel and restructuring OS software to minimize data movement.
The main purpose is to improve MPICH in order
to achieve real Gigabit/s speeds on 1000BaseSX Ethernet communications.
The MPICH implementation includes two MPI programs,
mpptest and goptest that provide reliable test of the performance
of an MPI implementation. The program mpptest provides testing of
both point-to-point and collective operations on a specified number of
processors, the program goptest can be used to study the scalability
of collective routines as a function of number of processors.
There also exists a script basetest, provided with the MPICH implementation, that can be used to get a more complete picture of the behaviour of a particular system.
The basic data are short- and long-message performance.
By using these programs we can get a picture of the best achievable bandwidth performance.
2.Identify and solve all problems regarding integration of the zero-copy layer in Linux OS
The second stage of this diplomawork consists of integration of the existing zero-copy layer in Linux operating system.
3.Improve the current MPICH version
Another step is to investigate and identify the weaknesses
of the resulting MPICH implementation. Based on these, it follows the improvement
of the current MPICH product. Establish the current performance.
4.Implement an efficient zero-copy layer based on fast buffer concept
There exists a very interesting proposal for An Efficient Zero-Copy I/O Framework for UNIX made by Sun Microsystems Laboratories, Inc. regarding buffer management and exchange between application programs and the UNIX kernel.
My proposal is to make use of their solution, adapt it in Linux operating system. Even if both, Solaris and Linux, are implementations of the UNIX OS, there are some differences between them.
A high-bandwidth cross-domain transfer facility,
called fast buffers (fbufs), combines virtual page remapping with
shared virtual memory, and exploits locality in I/O traffic to achieve
high throughput without compromising protection and security.
The UNIX interface has copy semantics, and it allows the application to specify an unaligned buffer address anywhere in its address space. Therefore, it is necessary the addition of an interface, based on explicit buffer exchange, for high-bandwidth I/O.
This solution has the following main elements:
The extensions to the API provides for the explicit
exchange of buffers (containing data) between application and OS which
eliminates copying (fig.1.).
There are some components to be implemented.
First, a library is required in order to provide the fbufs interface to Linux application.
Then it will be implemented a buffer pool manager, responsible for allocation of memory, tracking the allocation/deallocation of individual fbufs, managing mappings between user and kernel addresses and conversion between fbufs and Stream mblks.
A new system call implementation will provide the functionality of new read, write, get interfaces.
The library invokes the new system call component, via a trap, to transfer fbufs between kernel and application.
Device driver interface extensions allow the I/O subsystem to allocate fbufs in the kernel.
The device driver is changed regarding allocation
and management of fbufs. It implements a small amount of housekeeping in
the device driver.
The PC system has separate address spaces for the OS and for I/O. Device support routines are provided by the OS for device drivers to translate between the two domains. In traditional UNIX system, the addresses of buffers used in I/O operation are fairly arbitrary, but in our implementation the same buffers are frequently reused for I/O to the same device. The device driver had to be optimized to take advantage of this referential locality by cacheing translations between kernel and I/O addresses of fbufs and thus avoiding the expensive translation routines.
Establish the current performance. The fbufs cost will be compared with the cost of the corresponding operation using the standard Streams or sockets.
Based on the performance evaluation performed for
each product obtained during these three stages, there will be some conclusions
regarding the improvements in the performance on 1000BaseSX Ethernet communication.
[ CS-Department | Up ]ETH Zürich: Department of Computer Science