Department of Computer Science
Course: CS 3725
Horizontal Bar

next up gif
Next: Other methods for fast Up: Cache memory Previous: Examples - the VAX

Simulating cache memory performance

Since much of the effectiveness of the system depends on the cache miss rate, it is important to be able to measure, or at least accurately estimate, the performance of a cache system early in the system design cycle.

Clearly, the type of jobs (the ``job mix'') will be important to the cache simulation, since the cache performance can be highly data and code dependent. The best simulation results come from actual job mixes.

Since many common programs can generate a large number of memory references, (document preparation systems like LaTeX, for example), the data sets for cache traces for ``typical'' jobs can be very large. In fact, large cache traces are required for effective simulation of even moderate sized caches.

For example, given a cache size of 8K lines with an anticipated miss rate of, say. 10%, (1 miss in 10 memory references) we would require about 80K lines to be fetched from memory before it could reasonably be expected that each line in the cache was replaced. To determine reasonable estimates of actual cache miss rates, each cache line should be replaced a number of times (the ``accuracy'' of the determination depends on the number of such replacements.) This net effect is to require a memory trace of some factor larger, say another factor of 10, or about 800K lines. That is, the trace length would be at least 100 times the size of the cache. Lower expected cache miss rates and larger cache sizes exacerbate this problem. (e.g., for a cache miss rate of 1%, a trace of 100 times the cache size would be required to, on average, replace each line in the cache once. A further, larger, factor would be required to determine the miss rate to the required accuracy.) The following two results (see High Performance Computer Architecture by H.S. Stone, Addison Wesley, Chapter 2, Section 2.2.2, pp. 57-70) derived by Puzak, in his Ph.D. thesis (T.R. Puzak, Cache Memory Design, University of Massachusetts, 1985) can be used to reduce the size of the traces and still result in realistic simulations.

The first trace reduction, or trace stripping, technique assumes that a series of caches of related sizes starting with a cache of size N, all with the same line size, are to be simulated with some cache trace. The cache trace is reduced by retaining only those memory references which result in a cache miss for a direct mapped cache.

Quoting Stone, and Puzak,

Assume that a set of analyses is to be done for caches with a fixed line size L and at least N sets. Them prepare a reduced address trace by simulating a one-way associative cache with N sets and line size L operating on the full trace. Output a reduced trace that contains only the addresses that produced misses in the N-set, one-way set associative cache.

Note that, for a miss rate of 10%, 90% of the memory trace would be discarded. Lower miss rates result in higher reductions.

The reduced trace will produce the same number of cache misses as the original trace for:

In other words, for caches with size some power of 2, it is possible to investigate caches of with sizes a multiple of the initial cache size, and with arbitrary set associativity using the same reduced trace.

The second trace reduction technique is not exact; it relies on the observation that generally each of the N sets behaves statistically like any other set; consequently observing the behaviour of a small subset of the cache sets is sufficient to characterize the behaviour of the cache. (The accuracy of the simulation depends somewhat on the number of sets chosen, because some sets may actually have behaviours quite different from the ``average.'') Puzak suggests that choosing about 10% of the sets in the initial simulation is sufficient.

Combining the two trace reduction techniques typically reduces the number of memory references required for the simulation of successive caches by a factor of 100 or more. This gives a concomitant speedup of the simulation, with little loss in accuracy.

next up gif
Next: Other methods for fast Up: Cache memory Previous: Examples - the VAX

Paul Gillard
Mon Nov 24 20:44:06 NST 1997