Lecture-9.4: Numerical Problems with Pipelining, Part-1 (COA)

Numerical problems with Pipelining

Q1: If 15 milliseconds are given to each clock cycle, and there are 4 instructions that pass through 5 stages to complete its execution in the pipeline,

1. How much time is required to complete the execution of all instructions?

2. Calculate the efficiency of the system

Solution: A

Total clock cycle = K+ (n-1); as K are stages so K=5 and n are no of instructions so n=4

= 5+(4-1)

Time for one clock cycle 15 ms

Time for 8 clock cycles = 15*8 ms

Solution: B

Efficiency or utilization = total no of used boxes in pipelining/ Total no of boxes

In 4 instructions diagram with 5 stages, the total box boxes are 40.

8 instructions, each using 5 stages, so a total of 20 boxes are used

Efficiency or utilization = 20/40= ½

Therefore, CPI is almost one in pipelining and has higher efficiency and throughput in the pipeline as compared to non-pipelining.

Speedup Formula

The ratio between non-pipelining and pipeline is speed up. 8 instructions were completed in 12 clock cycles in the pipeline, but 40 cycles were required for non-pipelining. So, Speedup will be

Speedup = NP/P = 40/12 =3.1, so 3.1 times is Speedup. NP is non-pipelining, and P is pipelining.

Stage Delay

Every stage has circuits that are used to process data. So, some time is required at every stage, called stage delay.

Registers Delay

Registers between stages are used to store intermediate results. These registers store the input value from the previous stage for the very next stage. If the stage delay is uniform, then we have no delay in registers. We can directly pass it to the next stage.

But if one stage’s processing speed is mismatched with another stage (means to say stage 1 is complete in 5ns but stage 2 is still in processing or its delay time is 8ns), then we have to store intermediate results in registers for some time to complete the next stage (stage 2).

Stages delay, and registers delay are given below in the diagram,

Q2: A 4-stage pipeline has stage delays as 150,120, 160, and 140ns. Registers are used between stages and have a delay of 5ns each. Assuming a constant clock rate, the total time taken to process 1000 data items on this pipeline will be …...?

Solution:

Consider a maximum stage delay so that the other instructions may executed, it founds in stage 3 which is equal to 165 (160-stage delay+5-register delay).

First, instruction/data passes through the entire stage, and the rest of the instructions will follow the pipeline. Every instruction is complete in every stage. So, the formula will be as follows.

First instruction x stages x time + Rest instructions x stages x time

= 1x4x165 + 999 x 1 x 165 ns = 165.5 usec.

Total Time Taken = (1 * 4 * 165) + (999 * 1 * 165)

= 1654595 ns

= 1654.595 µs

Q3: Consider a non-pipelined processor with a clock rate of 2.5 GHz and an. Cycle/instructions of four. The same processor is upgraded to a pipelined processor with five stages. However, the clock speed is reduced to 2 GHz due to internal pipeline delay. Assume that there is no stall (ideal condition) in the pipeline. The Speedup achieved in the pipeline processor is?

Solution:

Speedup = TNP/TP (“NP” is non-pipelining and “P” is pipelining)

As T= 1/F = So,

TNP = 4×1/2.5×109 Sec

TP = 1x 1/2×109 Sec

Speedup = (4×1/2.5×109 Sec) / (1x 1/2×109 Sec)

Note: Time for one instruction = cycles per instruction x clock rate

Q4: Consider a pipeline having 4 phases with duration 60, 50, 90 and 80 ns. Given latch delay is 10 ns. Calculate-

1. Pipeline cycle time

2. Non-pipeline execution time

3. Speed up ratio

4. Pipeline time for 1000 tasks

5. Sequential time for 1000 tasks

6. Throughput

Solution-

Given-

1) Four stage pipeline is used

2) Delay of stages = 60, 50, 90 and 80 ns

3) Latch delay or delay due to each register = 10 ns

Part-01: Pipeline Cycle Time-

Cycle time

= Maximum delay due to any stage + Delay due to its register

= Max {60, 50, 90, 80} + 10 ns

= 90 ns + 10 ns

= 100 ns

Part-02: Non-Pipeline Execution Time-

Non-pipeline execution time for one instruction

= 60 ns + 50 ns + 90 ns + 80 ns

= 280 ns

Part-03: Speed Up Ratio-

Speed up

= Non-pipeline execution time / Pipeline execution time

= 280 ns / Cycle time

= 280 ns / 100 ns

= 2.8

Part-04: Pipeline Time For 1000 Tasks-

Pipeline time for 1000 tasks

= Time taken for 1st task + Time taken for remaining 999 tasks

= 1 x 4 clock cycles + 999 x 1 clock cycle

= 4 x cycle time + 999 x cycle time

= 4 x 100 ns + 999 x 100 ns

= 400 ns + 99900 ns

= 100300 ns

Part-05: Sequential Time For 1000 Tasks-

Non-pipeline time for 1000 tasks
= 1000 x Time taken for one task

= 1000 x 280 ns

= 280000 ns

Part-06: Throughput-

Throughput for pipelined execution

= Number of instructions executed per unit time

= 1000 tasks / 100300 ns

Q5: A four-stage pipeline has the stage delays as 150, 120, 160 and 140 ns respectively. Registers are used between the stages and have a delay of 5 ns each. Assuming constant clocking rate, the total time taken to process 1000 data items on the pipeline will be-

1. 120.4 microseconds

2. 160.5 microseconds

3. 165.5 microseconds

4. 590.0 microseconds

Solution-

Given-

· Four stage pipeline is used

· Delay of stages = 150, 120, 160 and 140 ns

· Delay due to each register = 5 ns

· 1000 data items or instructions are processed

Cycle Time-

Cycle time

= Maximum delay due to any stage + Delay due to its register

= Max {150, 120, 160, 140} + 5 ns

= 160 ns + 5 ns

= 165 ns

Pipeline Time To Process 1000 Data Items-

Pipeline time to process 1000 data items

= Time taken for 1st data item + Time taken for remaining 999 data items

= 1 x 4 clock cycles + 999 x 1 clock cycle

= 4 x cycle time + 999 x cycle time

= 4 x 165 ns + 999 x 165 ns

= 660 ns + 164835 ns

= 165495 ns

= 165.5 μs

Thus, Option (C) is correct.

Lecture-9.4: Numerical Problems with Pipelining, Part-1 (COA)

Q4: Consider a pipeline having 4 phases with duration 60, 50, 90 and 80 ns. Given latch delay is 10 ns. Calculate-