Lecture-15: Advanced Processor Architecture (MPU)

 Advanced Processor Architecture


Modern processors use a combination of architectural techniques to achieve high performance, low power consumption, parallel execution, and efficient resource utilization. Here are the major concepts:

1. Superscalar Architecture

A superscalar processor can execute multiple instructions per clock cycle.
It includes:

  • Multiple execution units
  • Parallel pipelines
  • Instruction dispatch logic

Goal: Increase throughput by running several instructions at the same time.

 

2. Pipelining

Instructions are broken into stages (Fetch → Decode → Execute → Memory → Write-back).
Multiple instructions stay in different stages, making the CPU work continuously.

Advanced versions include:

3. Out-of-Order Execution (OoO)

Instructions are not executed in the original program order. Instead, they are executed when:

  • The required data is available
  • The execution unit is free

Hardware components involved:

  • Reservation stations
  • Reorder buffer (ROB)
  • Register renaming

This hides latency and boosts speed.

4. Register Renaming

Prevents data hazards (WAR, WAW) by giving each instruction its own physical register instead of reusing architectural registers.

Result: More parallel execution without conflicts.

5. Branch Prediction

To keep pipeline full, processors predict the next instruction after a branch.

Modern CPUs use:

  • Two-level adaptive predictors
  • Branch history tables
  • Global/local prediction
  • Neural predictors (in some ARM & Apple chips)

Bad prediction → pipeline flush → penalty.

 

6. Speculative Execution

The CPU executes instructions before knowing whether they are actually needed.
If prediction is correct → performance ↑
If wrong → discard results.

Used heavily in Intel, AMD, Apple M-series chips.

7. Multi-Core Architecture

Instead of increasing clock speed, modern CPUs add multiple cores inside one chip.

Types:

  • Single-core → Dual-core → Quad-core
  • Many-core (20+ cores in server CPUs)
  • Heterogeneous cores (big.LITTLE architecture)

 

8. Heterogeneous Computing (big.LITTLE)

Used in ARM-based mobile & Apple Silicon:

  • Performance cores (P-cores) → High speed
  • Efficiency cores (E-cores) → Low power

The scheduler chooses which core to use.

 

9. Simultaneous Multithreading (SMT / Hyper-Threading)

A single core appears as two logical threads.

It allows:

  • Higher resource utilization
  • Overlapping stalls
  • Better throughput

Intel → Hyper-Threading
AMD → SMT

 

10. Cache Hierarchy & Advanced Memory Architecture

Modern CPUs depend heavily on caches:

  • L1 (fastest, smallest)
  • L2 (larger, slower)
  • L3 (shared across cores)
  • L4/eDRAM (in some server chips)

Advanced techniques:

  • Cache coherence protocols (MESI, MOESI)
  • Victim caches
  • Smart prefetching algorithms

 

11. Instruction Set Innovations

RISC vs CISC

  • Intel/AMD → CISC (x86_64), but internally convert to RISC-like micro-ops
  • ARM → Pure RISC (simple & power efficient)

Vector Extensions

  • Intel → AVX, AVX2, AVX-512
  • ARM → NEON, SVE
  • Used in AI, multimedia, scientific computing

 

12. Accelerator Integration

Modern processors integrate accelerators for special workloads:

  • AI accelerators / NPUs
  • GPU cores (APU architecture)
  • Cryptographic engines
  • Image signal processors (ISP)

Example: Apple M1/M2/M3 integrates CPU + GPU + Neural Engine.

 

13. Chiplet Architecture

Instead of one large die, CPUs now use multiple smaller chiplets connected via high-speed interconnects.

AMD uses:

  • CCD (Core Complex Die)
  • IOD (I/O Die)

Benefits:

  • Better yields
  • Lower manufacturing cost
  • Higher scalability

 

14. Power Management Technologies

To save battery:

  • Dynamic Voltage and Frequency Scaling (DVFS)
  • Turbo Boost (Intel) / Precision Boost (AMD)
  • Thermal throttling
  • Adaptive power gating


Summary Table

 

Feature

Purpose

Superscalar

Execute multiple instructions per cycle

Pipelining

Overlap instruction stages

Out-of-order

Maximize performance by ignoring program order

Branch prediction

Avoid stalls during conditional jumps

Speculative execution

Boost performance using prediction

SMT

Use idle resources more efficiently

Multi-core

Parallel processing

Heterogeneous cores

Balance performance & power

Chiplets

High scalability & efficiency

Vector engines

High-speed math operations

 

Post a Comment

0 Comments