Advanced Processor Architecture
Modern processors use a combination of architectural techniques to achieve high performance, low power consumption, parallel execution, and efficient resource utilization. Here are the major concepts:
1. Superscalar
Architecture
A superscalar processor
can execute multiple instructions per clock cycle.
It includes:
- Multiple execution units
- Parallel pipelines
- Instruction dispatch logic
Goal:
Increase throughput by running several instructions at the same time.
2. Pipelining
Instructions are broken
into stages (Fetch → Decode → Execute → Memory → Write-back).
Multiple instructions stay in different stages, making the CPU work
continuously.
Advanced versions
include:
- Deep pipelines
(Pentium 4 had ~20+ stages)
- Dynamic pipeline resizing
(modern ARM big.LITTLE)
- Out-of-order pipelining
3. Out-of-Order Execution
(OoO)
Instructions are not
executed in the original program order. Instead, they are executed when:
- The required data is available
- The execution unit is free
Hardware components
involved:
- Reservation stations
- Reorder buffer (ROB)
- Register renaming
This hides latency and boosts speed.
4. Register Renaming
Prevents data hazards
(WAR, WAW) by giving each instruction its own physical register instead of
reusing architectural registers.
Result: More parallel execution without conflicts.
To keep pipeline full,
processors predict the next instruction after a branch.
Modern CPUs use:
- Two-level adaptive predictors
- Branch history tables
- Global/local prediction
- Neural predictors (in some ARM &
Apple chips)
Bad prediction → pipeline
flush → penalty.
The CPU executes
instructions before knowing whether they are actually needed.
If prediction is correct → performance ↑
If wrong → discard results.
Used heavily in Intel, AMD, Apple M-series chips.
7. Multi-Core
Architecture
Instead of increasing
clock speed, modern CPUs add multiple cores inside one chip.
Types:
- Single-core → Dual-core → Quad-core
- Many-core (20+ cores in server CPUs)
- Heterogeneous cores (big.LITTLE
architecture)
8. Heterogeneous
Computing (big.LITTLE)
Used in ARM-based mobile
& Apple Silicon:
- Performance cores (P-cores)
→ High speed
- Efficiency cores (E-cores)
→ Low power
The scheduler chooses
which core to use.
9. Simultaneous
Multithreading (SMT / Hyper-Threading)
A single core appears as two
logical threads.
It allows:
- Higher resource utilization
- Overlapping stalls
- Better throughput
Intel → Hyper-Threading
AMD → SMT
10. Cache Hierarchy &
Advanced Memory Architecture
Modern CPUs depend
heavily on caches:
- L1 (fastest,
smallest)
- L2 (larger,
slower)
- L3 (shared
across cores)
- L4/eDRAM
(in some server chips)
Advanced techniques:
- Cache coherence protocols (MESI,
MOESI)
- Victim caches
- Smart prefetching algorithms
11. Instruction Set
Innovations
- Intel/AMD → CISC (x86_64), but
internally convert to RISC-like micro-ops
- ARM → Pure RISC (simple & power
efficient)
Vector Extensions
- Intel → AVX, AVX2, AVX-512
- ARM → NEON, SVE
- Used in AI, multimedia, scientific
computing
12. Accelerator
Integration
Modern processors
integrate accelerators for special workloads:
- AI accelerators / NPUs
- GPU cores (APU architecture)
- Cryptographic engines
- Image signal processors (ISP)
Example: Apple M1/M2/M3
integrates CPU + GPU + Neural Engine.
13. Chiplet Architecture
Instead of one large die,
CPUs now use multiple smaller chiplets connected via high-speed
interconnects.
AMD uses:
- CCD (Core Complex Die)
- IOD (I/O Die)
Benefits:
- Better yields
- Lower manufacturing cost
- Higher scalability
14. Power Management
Technologies
To save battery:
- Dynamic Voltage and Frequency Scaling
(DVFS)
- Turbo Boost (Intel) / Precision Boost
(AMD)
- Thermal throttling
- Adaptive power gating
Summary Table
|
Feature |
Purpose |
|
Superscalar |
Execute multiple
instructions per cycle |
|
Pipelining |
Overlap instruction
stages |
|
Out-of-order |
Maximize performance by
ignoring program order |
|
Branch prediction |
Avoid stalls during
conditional jumps |
|
Speculative execution |
Boost performance using
prediction |
|
SMT |
Use idle resources more
efficiently |
|
Multi-core |
Parallel processing |
|
Heterogeneous cores |
Balance performance
& power |
|
Chiplets |
High scalability &
efficiency |
|
Vector engines |
High-speed math
operations |
0 Comments