Memory Access Patterns of Scientific Codes

Raul Adorean Silaghi

Diploma Thesis Summer 2000
Supervisors: Irina Chihaia, Michela Taufer, Christian Kurmann, Prof.T. Stricker
Institute for Computer Systems, ETH Zürich


Objectives

Memory systems performance can be a critical factor in some high performance computing applications. The usage of the memory system can determine whether an application can execute faster in parallel using a second processor on PC or using an entire cluster of PCs. In previous work our group has found a simple way to characterize memory systems performance by benchmarking access streams with different working set and different strides (MemPerf).

In this Diploma thesis we were looking for an experimental way to classify the Opal application according to its memory system usage. Some applications depend primarily on faster clock speeds of microprocessors, some depend primarily on contiguous memory accesses, some on strided memory accesses, and finally some applications are "out of core" and depend on other factors like disk speed or network speeds. Finally we attempted to model the performance (execution time) based on a suspected memory access pattern and on the memory system performance characterization of the machine.

Results

In order to be able to speak about a memory access pattern, a lot of data characterizing the Opal application from the memory point of view were needed. What data would give the maximum information and how should we get that data were the two big questions that we were trying to answer.

The first step of our work was to investigate in which procedures spends the Opal application most of its computing time. Then in order to collect run-time information about these Opal procedures we used Performance Counters. These are a special on-chip logic which summarize specific run-time events. The events we monitored were Level I/II Cache Hits/Miss and RAM accesses. Then the memory addresses that the Opal application accesses were gathered by using an Assembly Instrumentation Tool (self implemented). These addresses, called program trace, were then given to a Cache Simulator (self implemented) which identified the data strides and their frequency in Level I/II Cache and memory. Using the results of MemPerf and the computed distribution of data strides, we were able to estimate the execution time of the procedures of interest.


[ CS-Department | Up ]

ETH Zürich: Department of Computer Science
Comments to Irina Chihaia <chihaia@inf.ethz.ch>
Date 31-7-2000