Lecture 2 - Quantitative Computer Architecture

Idea

How to measure, analyze, and specify computer system performance

Performance Measurement and Analysis in Computer Architecture

State diagram of the iterative development lifecycle in computer architecture

Performance

What is Performance?

Metrics:

Execution time (main metric)
Throughput
Of what input/program?
What is relative performance? (against baseline)

What is Execution Time?

% time program  
... program results ...  
160.7u 19.9s 4:15 71%  
%

Explanation of numbers in order: u = user CPU time, s = system CPU time, wall-clock time (actual real world time that passes), percent of CPU time

Why is wall clock time not matching CPU time?
- I/O, Swaps

Relative Performance

Most of the time in computer arch, we refer to relative performance because it provides more value to the end user than an absolute time. With absolute time, there’s no baseline to know whether if the time is fast or not.

Definition: $R e l a t i v e P er f or ma ce = \frac{P er f or ma c e _{x}}{P er f or man c e _{y}} = \frac{E x ec u t i o n T im e _{y}}{E x ec u t i o n T im e _{x}} = n$

How to Specify Performance?

With relative performance, performance metrics only has meaning in the context of a program or workload. What about performance of a single machine?

Response Time or Throughput

SPEC: System Performance Evaluation Cooperative

Defines workloads to benchmark performance and evaluate performance

How to Summarize Performance

You get a bunch of speedup numbers from SPEC, what is the single number that your boss wants

Tools:

Arithmetic Mean - “Total time if ran all at once”
Weighted Arithmetic Mean - “with weights on programs”
Geometric Mean - “Typical speedup across programs”
Harmonic Mean - “average rate of progress”

Summary of Performance

Even the unweighted arithmetic mean implies a weighting
- if one program is way slower than the other, it has more inherent weight in the final arithmetic sum than the faster program
Geometric mean does not necessarily predict execution time for any mix of the programs
- not real wall-clock prediction
- if you ran all the programs in a given order, it doesn’t necessarily predict the time
ratios of geometric means never change (regardless of which machine is used as the base), and always give equal weight to all benchmarks
- if you compare machine A and machine B using GM it doesn’t matter if you normalize against A or against B, same speedup factor (stable and fair)
to give unequal weight requires weighted arithmetic mean
- if you care about certain programs more use weighted arithmetic mean
Answer: AM and GM have their uses. For unweighted measures (eg speedup), GM probably better.

Analyzing Performance

What tools do we used to analyze (predict) performance in absence of something to measure?

Speedup: relative performance $s p ee d u p = \frac{E x ec T im e w i t h o u t c han g e}{E x ec T im e w i t h c han g e}$
Amdahl’s Law: Impact of a performance improvement is limited by the percent of execution time affected by the improvement $E x ec u t i o n T im e a f t er im p ro v e m e n t = \frac{E x ec u t i o n T im e A ff ec t e d}{A m o u n t o f I m p ro v e m e n t} + E x ec u t i o n T im e U na ff ec t e d$

Time in Performance

In comp arch, Time is broken down in individual units called clock cycles $CP U E x ec u t i o n T im e = CP U c l oc k cyc l es * Cl oc k cyc l e t im e$

Every conventional process has a clock with an associated clock cycle time or clock rate
Every program runs in an integral number of clock cycles
GHz = billions of cycles/second
X GHZ = 1/X nanoseconds cycle time

$N u mb er o f CP U cyc l es = I n s t r u c t i o n s e x ec u t e d * A v er a g e Cl oc k C yc l es p er I n s t r u c t i o n$

$CP I = CP U c l oc k cyc l es / I n s t r u c t i o n co u n t$

What if the cycle time changes?

Clock range changes still happen at a very coarce granularity (and stalls the processor in between), so you can apply this formula for each interval that has a fixed clock rate
You can adjust for the reference(fastest) clock rate by multiplying a fudge factor

Deep pipeline: refers to a processor design where the instruction exectuion process is divided into a greater number of stages than shallower pipelines

Strength reduction → simpler instructions → easier to pipeline → lowers CPI

Aaron's Digital Garden 🪴

Recent Writing

Computer Arch Crash Course

The Missing Readme - consolidated by new grad

Caching Crash Course

OS Crash Course

Recent Notes

Dist OS

HW Disaggregation

Table of Contents