C++ Memory Ordering Model
Author: Walter Karas (wkaras@yahoo.com)
Indiviual memory events are memory location access starts and ends, fences, thread starts and ends, or some combination
of these. A memory location access is a load, a store, or both a load and a store.
A particular load start or end, or thread start or end are in a single thread execution. All other memory events
are (nominally) in all thread executions. A thread execution has exactly one start and end in it.
A thread execution defines a strict partial ordering of all memory events in it, represented in this document with the
conventional symbols < and >.
All memory events are caused by one thread execution, and are in the thread execution they are caused by. If S is
the start and E is the end of a thread execution T, there can be no memory event X in T where X < S or X > E.
The notation (X, Y) means that the event Y is distinct from event X, in the same thread execution as X, and not Y > X.
For (X, Y), let Z be a memory event in the same thread execution as X and Y, distinct from both of them. If Z < Y but
not Z < X, or event Z > X but not Z > Y, or there is no pair in the partial ordering relation containing Z and either
X or Y, then Z overlaps (X, Y). For (U, V) in the same thread execution as (X, Y), with
all of X, Y, U, V mutually distinct, unless V < X, or Y < U, (U, V) and (X, Y) overlap.
Every memory location access has a start and end event, For a memory location access A, A end < A start is not true in
any thread execution the events are in. Thus, a memory location access A can be referred to as (A start, A end).
If memory accesses A1 and A2 are in the same thread execution, and access the same memory location, they are said to
overlap if (A1 start, A1 end) and (A2 start, A2 end) overlap.
A memory store attempts to store a particular value to a memory location. Let S1 and S2 be two stores to the same memory
location in the same thread execution. If they overlap they both fail, otherwise they both succeed.
If a memory load L overlaps with a store, the result of L is undefined. Suppose S is a store to the same memory location
as L loads, in the same thread execution thread execution as L, and S end < L start, and there exists no other store S2
to the same memory location and in the same thread execution where S2 end < L start, but not S2 end < S start. If no
such S exists, the result of L is undefined. If S failed, the result of L is undefined. Else, if S succeeded, the result
of L is the value that was stored by S.
The partial ordering of the thread execution guarantees that memory accesses to the same memory location caused by the
thread execution will not overlap.
An atomic access of a memory location can be a load, a store, or a combined load and store. A comibined load and
store must always be atomic. The start and end of a combined load and store are not load start and end events, in
thread executions that did not cause the combined load and store. Atomic accesses to the same memory location do not overlap
with each other in any thread execution.
A memory event X is live in a thread execution with start S and end E if S < X and X > E.
The start of an atomic store may also be partial release fence. If S is a store and F is a partial release fence
caused by the same thread execution, and S < F, then S < F in all thread executions that both S and F are live in.
If F is a (standalone) full release fence, and S is a store caused by the same thread exection that caused F, with
F < S start, then F < S start in all thread executions that both F and S start are live in. F is also a partial
release fence.
Suppose there is a sequence of atomic loads from the same memory location in a thread execution. Let X be either the start
of an atomic store to the same memory location, and/or a release fence, in a different thread execution. There must exist
a finite number N such that, if the sequence is longer than N, there will exist a load L in the sequence, where L end > X
in the thread execution causing L.
There is also a global partial ordering that applies to all thread executions. If X and Y are memory events, and X < Y
in the global partial ordering, then X < Y in the partial ordering of all thread executions that both X and Y are live
in.
Notes (NB)
Acquire fences are only involved in the partial ordering of memory events in a particular thread
execution. The details of how a thread execution determines its partial ordering of memory events is
beyond the scope of this document.
Suppose that, due to compiler optimization, a variable is register-mapped thoughout a thread execution. Stores to
the variable in the thread execution are still nominally in all thread executions. They are not in any pair the ordering
relation of the other thread executionss, and thus would overlap with access to the variable in other thread executions,
with undefined results.