This document presents a proposal on explicit Memory Fences for the C++ standard. It has been extracted from the atomic operation library N2195 by Peter Dimov, which had been based in part from the memory model proposed in N2237 by Silvera, Wong, McKenney and Blainey, and on Alexander Terekhov's contributions to mailing lists and the comp.programming.threads discussion group.
Memory fences are a mechanism to provide ordering between memory operations. They differ from ordered atomic operations as described on N2195 and N2145 in that they are not associated to a specific memory access; they describe an ordering relationship between all preceding memory accessed and all subsequent ones. This proposal includes several variants of fences, which vary on the class of memory operations they affect (loads vs stores).
The proposal in this document maintains the semantics from N2195. The additional contribution from this paper is to discuss the justification for standalone memory fences and to summarize and refute the main objections to their introduction that have been raised so far.
There are multiple grounds for introducing explicit memory fences to the C++ standard:
This proposal includes three variants of fences: acquire, release and full (ordered) fences. Acquire and release fences have semantics analogous to the acquire and release operations in the atomics package. They are typically much better performing than fully ordered fences and they can be used on many concurrent programming idioms. N2237 includes a more detailed justification and use cases for these fences.
This section will summarize the objections raised so far against the introduction of memory fences and present some discussion to counter those arguments.
Globality. Standalone memory fences order all preceding memory operations vs all subsequent ones. Such globality could hinder their performance on widely decoupled massively parallel architectures, and that including them on the standard will prevent hardware vendors from developing such machines.
In the current memory model, acquire fences could be implemented (albeit suboptimally) by upgrading the latest executed load of each live atomic variable from relaxed to acquire. If the latest executed load of an atomic variable was already a load_acquire, the fence would have no effect on the visibility of that variable. While the set of variables affected is potentially unbound, it seems clear that if all atomic loads in a program were acquire loads, then all acquire fences would be redundant and could be eliminated. This supports the contention that the hardware mechanisms required to support acquire loads are sufficient to implement acquire fences. So, the introduction of acquire fences into the memory model will not prevent development of new parallel architectures any further than the presence of acquire loads. An analogous argument can be made for release and ordered fences.
There are no known existing platforms where fences cannot be efficiently implemented, and even if such a platform existed, only programs that explicitly use fences would be affected by their inclusion in the standard.
Teachability. The wide majority of programmers are generaly unable to comprehend weak memory ordering so they will be unable to properly use mechanisms such as standalone memory fences.
One of our main justification for introducing fences is to upgrade existing code that already is written in tersm of fences. In this case, the programmer is already familiar with the fence abstraction.
It is questionable whether weakly constrained atomic accesses are easier to teach than explicit memory fences. Some programmers find the fence abstraction much easier to comprehend and reason about. We believe that both are advanced tools that require approximately the same level of deep understanding. Fences are the ordering mechanism of choice for several existing weakly-ordered architectures and programming models, so most programmers with experience in concurrency are likely to have encountered them.
Redundancy. Atomic ordered operations already provide an ordering mechanism, so explicit memory fences are redundant and unnecessary.
The ordering provided by explicit fences is not exactly redundant with the one provided by ordered atomic operations. As described on N2237, some algorithms can be more precisely described using fences instead of atomic ordered operations. On the other hand, atomic ordered operations are fully redundant with the proposed fences; however, we're not recommending their removal since they are a useful abstraction that is sufficient in many practical cases.
This section described the functions to be introduced as part of the atomic library package.
// Fences inline void acquire_memory_fence( void ); inline void release_memory_fence( void ); inline void acq_rel_memory_fence( void ); inline void ordered_memory_fence( void ); // Compiler fences inline void acquire_compiler_fence( void ); inline void release_compiler_fence( void ); inline void acq_rel_compiler_fence( void ); inline void ordered_compiler_fence( void );
An alternative syntax for these primitives uses a single function with a template parameter to specify the ordering constraint. This proposal recommends a syntax consistent with the rest of the atomics package.
acquire_memory_fence( void ), ensures that all subsequent operations in program order are performed after all preceding loads in program order;
release_memory_fence( void ), ensures that all preceding operations in program order are performed before all subsequent stores in program order;
acq_rel_memory_fence( void ), combines the semantics of
ordered_memory_fence( void ), ensures that all preceding operations in program order are performed before all subsequent operations in program order.
The compiler versions of these fences provide the same mechanism as their memory counterparts, except that the memory accesses are ordered only with respect to threads executing on the same processing unit. These fences will avoid the introduction of any hardware primitives that order memory accesses across different processors.