Abstract
Modern CPUs employ a multitude of microarchitectural optimizations to enhance their performance, with instruction pipelining being a notable one. While pipelining allows for the parallel execution of instructions, it can sometimes lead to unexpected outcomes. This article explores the challenges posed by instruction pipelining in low-level programming, focusing on scenarios involving interrupts and memory writes. We introduce the concept of Instruction Synchronization Barriers (ISBs) as a solution to these challenges and discuss their application in code. Understanding ISBs and their importance is crucial for low-level programmers working on diverse CPU architectures.
Introduction
From an external perspective, it might seem that CPUs operate predictably and adhere to a sequential execution model (SEM). However, beneath the surface, there is a complex interplay of instructions and optimizations that can sometimes defy this apparent linearity. One such optimization is instruction pipelining, which allows CPUs to process instructions concurrently, significantly boosting performance. While pipelining generally remains transparent to users, there are cases where it can lead to unintended behaviors.
Understanding Instruction Pipelining
Instruction pipelining is a technique where the CPU breaks down instruction execution into discrete stages, such as instruction fetch, decode, and execute. Importantly, these stages can overlap, meaning that while one instruction is being executed, the next is being decoded, and the one after that is being fetched. This approach enables the CPU to operate without waiting for each instruction to complete all stages before moving on to the next.
Scenario 1: Critical Sections
In this scenario, the driver communicates with the device via a sequence that cannot be interrupted until it is completed. Typically, such code is placed in critical sections, which means that interrupts are disabled before the sequence is started and enabled after it’s completed:
Instruction | Description |
0 | Disable interrupts |
1 | Sequence part 1 |
2 | Sequence part 2 |
3 | Sequence part 3 |
4 | Enable interrupts |
In a pipelined CPU, the instruction execution might appear as follows:
Cycle | Instruction fetched | instruction decoded | instruction executed |
0 | 0 | ||
1 | 1 | 0 | |
2 | 2 | 1 | 0 |
3 | 3 | 2 | 1 |
4 | 4 | 3 | 2 |
5 | 4 | 3 | |
6 | 4 |
However, issues arise when certain instructions, like interrupt-related ones, interfere with the pipelining. For instance, if an interrupt occurs when interrupts are not disabled yet but some part of the sequence was already fetched:
Cycle | Instruction fetched | instruction decoded | instruction executed |
0 | 0 | ||
1 | 1 | 0 | |
2 | 2 | 1 | 0 |
Interrupt | |||
3 | ISR 0 | 2 | 1 |
4 | ISR 1 | ISR 0 | 2 |
5 | ISR 2 | ISR 1 | ISR 0 |
The sequence is broken by the Interrupt Service Routine (ISR) code being executed in the middle. Part 1 and 2 of the sequence is separated from part 3.
Scenario 2: Memory Write and Execution
In this scenario, a loader copies code into memory and executes it immediately:
Instruction | Description |
0 | Copy instruction to memory |
1 | Jump to copied instruction |
2 | Copied instruction |
With pipelining, when a copy instruction is being executed, the copied instruction may be already fetched. Its value will then be different than expected.
Addressing Pipelining Problems with ISB
To resolve issues stemming from instruction pipelining, an **Instruction Synchronization Barrier (ISB)** is introduced. When executed, the ISB forces the processor to flush the entire instruction pipeline, ensuring all subsequent instructions will be re-fetched before proceeding.
Scenario 1: Critical Sections
For the first scenario, where an interrupt disrupts a critical section, an ISB can be placed immediately after the interrupt disable instruction:
Instruction | Description |
0 | Disable interrupts |
1 | ISB |
2 | Sequence part 1 |
3 | Sequence part 2 |
4 | Sequence part 3 |
5 | Enable interrupts |
With the ISB in place, the CPU pipeline behaves as follows:
Cycle | Instruction fetched | instruction decoded | instruction executed |
0 | 0 | ||
1 | 1 | 0 | |
2 | 2 | 1 | 0 |
Interrupt | |||
3 | ISR 0 | 2 | 1 (ISB) |
4 | ISR 0 | ||
5 | ISR 1 | ISR 0 | |
6 | ISR 2 | ISR 1 | ISR 0 |
Here, instruction 2 is dropped from the pipeline, preventing sequence from fragmentation by Interrupt Service Routine.
Scenario 2: Memory Write and Execution
In the second scenario, involving a memory write and immediate execution, an ISB can be added after the memory write instruction to ensure the instruction is re-fetched after the write, guaranteeing it has the correct value when executed.
Architecture-Dependent Considerations
The necessity for ISB in your code largely depends on the CPU architecture and can vary from one CPU to another. Different CPUs may have distinct constraints, so it’s crucial to verify that the code complies with the requirements of the target architecture. When writing code intended for potential portability across different architectures, adopting a more defensive approach can help avoid problematic and challenging-to-debug issues in the future.
The examples discussed in this document are applicable to most processors in the ARM Cortex-M line. Additionally, in some architectures, a **Data Synchronization Barrier (DSB)** may be required before the ISB to ensure that all writes preceding the ISB instruction are resolved correctly.
ISB is not directly available in the C language. To use it, the ASM macro must be inlined. Most CPU vendors provide _ISB() macros in their headers.