UTeM student computer organization and architecture : Measuring Cache Performance

Measuring Cache Performance

There are two different techniques for improving cache performance.

i. One focuses on reducing the miss rate by reducing the probability that

2 different memory blocks will contend for the same cache location.

ii. The second technique reduces the miss penalty by adding an additional level to the

hierarchy.

CPU time can divide into the clock cycles that the CPU spends executing

the program and the clock cycles that the CPU spends waiting for the memory

system. We assume that the costs of cache accesses that are hits are part

of the CPU execution cycles. So,

CPU time= (CPU execution clock cycles+ Memory stall clock cycles)

x Clock cycle time

The memory stall clock cycles come mainly from cache misses,

In real processors, the stalls generated by reads and writes can be

quite complex, and accurate performance prediction usually requires

detailed simulations of the processor and the memory system.

Memory-stall clock cycles can be defined as the sum of the staU cycles coming

from reads plus those coming from writes:

Memory stall cycles = (Memory accesses / Program) x Miss rate x Miss penalty

We can also factor this as :

Memory stall clock cycles = (Instructions / Program) x (Misses / Instruction) x Miss penalty

Cache Performance Example

Assume an instruction cache miss rate for a program is 2% and a data cache miss rate is 4%. If a processor has a CPI of 2 without any memory stalls and the miss penalty is 100 cycles for all misses, determine how much faster a processor would run with a perfect cache that never missed.

solution:

Let I =instruction count

Instruction miss cycles = I X 2% X 100 = 2.00 X I

Data miss cycles = I x 36% x 4% x 100 = 1.44 X I

The total number of memory stall cycles is 2.00 I + 1.44 I =3.44 I. This is more than 3 cycles of memory stall per instruction. Accordingly, the CPI with memory stalls is 2 + 3.44 =5.44. Since there is no change in instruction count or clock rate, the ratio of the CPU execution times is

(CPU time with stalls / CPU time with perfect cache = ( I X CPIstall X Clock cycle / I X CPIperfect X Clock cycle)

CPIstall / CPIperfect = 5.44/2

= 2.72

The performance with the perfect cache is better by 2.72

Performance summary

i. When CPU performance increase, miss penalty becomes more significant.

ii. Decreasing base CPI will make the proportion of time spent on memory stalls more greater.

iii. Increasing clock rate, memory stalls account for more CPU cycles.

iv. Cache behavior cannot be neglected when evaluating system performance.

Associative Caches

Fully associative

1. A block in memory may be associated with any entry in the cache.

2. To find a given block in a fully associative cache, all the entries in the cache must be searched because a block can be placed in any one.

3. To make the search practical, it is done in parallel with a comparator associated with each cache entry. These comparators significantly increase the hardware cost, effectively making fully associative placement practical only for caches with small numbers of blocks.

n-way set associative

1. Consists of a number of sets, each of which consists of n blocks.

2. Each block in the memory maps to a unique set in the cache given by the index field, and a block can be placed in any element of that set.

3. Remember that in a direct-mapped cache, the position of a memory block is given by :

(Block number) modulo (Number of cache blocks)

4. In a set-associative cache, the set containing a memory block is given by :

(Block number) modulo (Number of sets in the cache)

5. N comparators is less expensive compared to fully associative.

Associative Cache Example

For a cache with 8 entries

Compare 4-block caches

i. Direct mapped

ii. 2-way set associative

iii. Fully associative

Block access sequence: 0, 8, 0, 6, 8

Direct mapped & 2-way set associative

Set Associative Cache Organization

UTeM student computer organization and architecture

Sunday, December 15, 2013

Measuring Cache Performance

No comments:

Post a Comment

About Me