Synchronization is required for operation of execution. Two memory accesses form a data race if they are from different threads to the same location, at least one is a write, and they occur one after another.
we focus on the implementation of lock and unlock synchronization operations.
Lock and unlock can be used straightforwardly to create regions where only a single processor can operate, called a mutual exclusion
To implement this two operation, we need hardware primitives that atomically read and write into memory.
One typical operation for building synchronization operations is the exchange or swap, which inter-changes a value in a register for a value in memory.
Assume that we want to build a simple lock where the value 0 is used to indicate that the lock is free and 1 is used to indicate that the lock is unavailable. What processor do is exchanging 1 in register to memory. Whatever of two process p1
, p2
get first access, the content of the memory address would be 1 indicating the access had already been claimed locked. The key of design is two “simultaneous” exchanges always will be ordered by the hardware.
An alternative is to have a pair of instructions in which the
second instruction returns a value showing whether the pair of instructions was executed successfully as if the pair was atomic.
For example,
lr.d
: While reading, it “reserves” the memory location. This means that the processor keeps track of the fact that it intends to perform an atomic update to that location. It returns the value that was originally in memory.sc.d
: to write a new value to the same memory address that was reserved by the earlier. It will only succeed if the reserved memory location has not been modified by any other processor or process in the meantime.again:lr.d x10, (x20)
// Loads the current value from the memory location pointed to by `x20` into `x10` and reserves that location.
sc.d x11, x23, (x20)
// Attempts to store the value from `x23` into the memory location
bne x11, x0, again
// If `x11` is nonzero (indicating failure), the branch causes the sequence to start over
addi x23, x10, 0 // put loaded value in x23
Any time a processor interpose between the lr.d
and sc.d
instruction, the whole process will be tried again.