In this semester work a library was implemented, that permits easy access to these performance counting registers from programs. The library is easy to use and offers a uniform interface under Linux and Windows NT.
The events that can be counted refer to the logical units of the processor. In the floating point unit (FPU) for example, it is possible to count how many multiplications have been performed, how many divisions, or how many FPU operations in total, etc.
For the mentioned functions special machine instructions are used, from which a part can only be performed in privilege level 0. Because of this a device driver had to be written for each of the two operating systems.
Most of the functions were implemented directly in the library and the drivers were accessed only where it was indispensable in order not to take negative influence onto efficiency by redundant context switches. The call of a driver is a relatively expensive operation, which is confirmed by respective measures of overhead.
To demonstrate the usefulness of the library, the SpecMark 129 (Compress) was instrumented and measured.
What one cannot see by the received data is, through what an event was triggered. It is only evident whether and how often an event occured. In a program that causes many cache misses for example, you can't find out the reason directly.
In spite of this library, which makes access to such data much easier, the optimization of an application still requests high skills to the programmer.