|Department of Computer Science |
Course: CS 3725
The cache is a small amount of high-speed memory, usually with a memory cycle time comparable to the time required by the CPU to fetch one instruction. The cache is usually filled from main memory when instructions or data are fetched into the CPU. Often the main memory will supply a wider data word to the cache than the CPU requires, to fill the cache more rapidly. The amount of information which is replaces at one time in the cache is called the line size for the cache. This is normally the width of the data bus between the cache memory and the main memory. A wide line size for the cache means that several instruction or data words are loaded into the cache at one time, providing a kind of prefetching for instructions or data. Since the cache is small, the effectiveness of the cache relies on the following properties of most programs:
Data is usually structured, and data in these structures normally are stored in contiguous memory locations.
Generally, several operations are performed on the same data values, or variables.
When a cache is used, there must be some way in which the memory controller determines whether the value currently being addressed in memory is available from the cache. There are several ways that this can be accomplished. One possibility is to store both the address and the value from main memory in the cache, with the address stored in a type of memory called associative memory or, more descriptively, content addressable memory.
An associative memory, or content addressable memory, has the property that when a value is presented to the memory, the address of the value is returned if the value is stored in the memory, otherwise an indication that the value is not in the associative memory is returned. All of the comparisons are done simultaneously, so the search is performed very quickly. This type of memory is very expensive, because each memory location must have both a comparator and a storage element. A cache memory can be implemented with a block of associative memory, together with a block of ``ordinary'' memory. The associative memory would hold the address of the data stored in the cache, and the ordinary memory would contain the data at that address. Such a cache memory might be configured as shown in Figure .
Figure: A cache implemented with associative memory
If the address is not found in the associative memory, then the value is obtained from main memory.
Associative memory is very expensive, because a comparator is required for every word in the memory, to perform all the comparisons in parallel. A cheaper way to implement a cache memory, without using expensive associative memory, is to use direct mapping. Here, part of the memory address (usually the low order digits of the address) is used to address a word in the cache. This part of the address is called the index. The remaining high-order bits in the address, called the tag, are stored in the cache memory along with the data.
For example, if a processor has an 18 bit address for memory, and a cache of 1 K words of 2 bytes (16 bits) length, and the processor can address single bytes or 2 byte words, we might have the memory address field and cache organized as in Figure .
Figure: A direct mapped cache configuration
This was, in fact, the way the cache is organized in the PDP-11/60. In the 11/60, however, there are 4 other bits used to ensure that the data in the cache is valid. 3 of these are parity bits; one for each byte and one for the tag. The parity bits are used to check that a single bit error has not occurred to the data while in the cache. A fourth bit, called the valid bit is used to indicate whether or not a given location in cache is valid. In the PDP-11/60 and in many other processors, the cache is not updated if memory is altered by a device other than the CPU (for example when a disk stores new data in memory). When such a memory operation occurs to a location which has its value stored in cache, the valid bit is reset to show that the data is ``stale'' and does not correspond to the data in main memory. As well, the valid bit is reset when power is first applied to the processor or when the processor recovers from a power failure, because the data found in the cache at that time will be invalid.
In the PDP-11/60, the data path from memory to cache was the same size (16 bits) as from cache to the CPU. (In the PDP-11/70, a faster machine, the data path from the CPU to cache was 16 bits, while from memory to cache was 32 bits which means that the cache had effectively prefetched the next instruction, approximately half of the time). The amount of information (instructions or data) stored with each tag in the cache is called the line size of the cache. (It is usually the same size as the data path from main memory to the cache.) A large line size allows the prefetching of a number of instructions or data words. All items in a line of the cache are replaced in the cache simultaneously, however, resulting in a larger block of data being replaced for each cache miss.
The MIPS R2000/R3000 had a built-in cache controller which could control a cache up to 64K bytes. For a similar 2K word (or 8K byte) cache, the MIPS processor would typically have a cache configuration as shown in Figure . Generally, the MIPS cache would be larger (64Kbytes would be typical, and line sizes of 1, 2 or 4 words would be typical).
Figure: One possible MIPS cache organization
A characteristic of the direct mapped cache is that a particular memory address can be mapped into only one cache location. Many memory addresses are mapped to the same cache location (in fact, all addresses with the same index field are mapped to the same cache location.) Whenever a ``cache miss'' occurs, the cache line will be replaced by a new line of information from main memory at an address with the same index but with a different tag.
Note that if the program ``jumps around'' in memory, this cache organization will likely not be effective because the index range is limited. Also, if both instructions and data are stored in cache, it may well happen that both map into the same area of cache, and may cause each other to be replaced very often. This could happen, for example, if the code for a matrix operation and the matrix data itself happened to have the same index values.
A more interesting configuration for a cache is the set associative cache, which uses a set associative mapping. In this cache organization, a given memory location can be mapped to more than one cache location. Here, each index corresponds to two or more data words, each with a corresponding tag. A set associative cache with n tag and data fields is called an ``n-way set associative cache''. Usually , for k = 1, 2, 3 are chosen for a set associative cache (k = 0 corresponds to direct mapping). Such n-way set associative caches allow interesting tradeoff possibilities; cache performance can be improved by increasing the number of ``ways'', or by increasing the line size, for a given total amount of memory. An example of a 2-way set associative cache is shown in Figure , which shows a cache containing a total of 2K lines, or 1 K sets, each set being 2-way associative. (The sets correspond to the rows in the figure.)
Figure: A set-associative cache organization
In a 2-way set associative cache, if one data word is empty for a read operation corresponding to a particular index, then it is filled. If both data words are filled, then one must be overwritten by the new data. Similarly, in an n-way set associative cache, if all n data and tag fields in a set are filled, then one value in the set must be overwritten, or replaced, in the cache by the new tag and data values. Note that an entire line must be replaced each time. The most common replacement algorithms are:
Cache memories normally allow one of two things to happen when data is written into a memory location for which there is a value stored in cache:
The MIPS R2000/3000 processors used the write-through approach, with a buffer for the memory writes. (This was also the approach taken by the The VAX-11/780 processor ) In practice, memory writes are less frequent than memory reads; typically for each memory write, an instruction must be fetched from main memory, and usually two operands fetched as well. Therefore we might expect about three times as many read operations as write operations. In fact, there are often many more memory read operations than memory write operations.
Figure shows the behaviour (actually the miss ratio, which is equal to 1 - the hit ratio) for cache memories with various combinations of total cache memory capacity and line size. The results are from simulations of the behaviour of several ``typical'' program mixes. Several interesting things can be seen from these figures; Figure shows that the miss ratio drops consistently with cache size. Note, also, that increasing the line size is not always effective in increasing the throughput of the processor, although it does decrease the hit ratio, because of the additional time required to transfer large lines of data from the main memory to the cache.
Figure: Cache memory performance for various line sizes
It is interesting to plot the same data using log-log coordinates. Note that in this case. the graph is (very) roughly linear. Figure shows this plot.
Figure: Log-log plot of cache performance for various line sizes
The way size, or degree of associativity, of a cache also has an effect on the performance of a cache; the same reference determined that, for a fixed cache size, there was a roughly constant ratio between the performance of caches with a given set associativity and direct-mapped caches, independent of cache size. This relation is shown in Figure . (Of course, the performance of the set associative caches improved with associativity.)
Figure: Cache adjustments for associativity (relative to direct mapping)