Atomic floating-point min/max
Document number: P3008R3.
Date: 2024-11-18.
Authors: Gonzalo Brito Gadeschi, David Sankel <dsankel@adobe.com>.
Reply to: gonzalob at nvidia.com .
Audience: LEWG.
Table of Contents
Changelog
- Revision 2 (post SG6 review): SG6 took the following poll and the paper is updated accordingly:
-
Poll 1: Accept SG1 proposal as listed in P3008R1 with "implementation-defined" instead of "unspecified" and forward to LEWG.
Consensus.
- A: preference for no fetch_min/max default, but rather explicit fetch_fminimum/minimum_num
- A: implementation-define may cause pessimistic implementations.
- SF: two authors.
-
Poll 2: If one of the input is a NaN then it is "unspecified" whether the result value is either a NaN or the other value.
Consensus.
SA: bad for teachability.
-
Poll 3: With respect to signaling NaNs, SG6 recommends:
- for
fetch_fminimum
/fminimum_num
/fmaximum
/fmaximum_num
to be specified in terms of their corresponding C23 APIs (ignore signaling NaNs in the wording, refer to C23), and
- for fetch_min/max, to not explicitly specify signaling NaN behavior (ignore signaling NaNs in the wording)
Consensus.
- Revision 1 (post SG1 review):
- Update to account for removal of min/max from P0493.
- Update C17 to C23 wording for signaling NaNs.
- SG1 DIRECTION POLL:
-
The default fetch_min(T, mo=SC)->T, fetch_max(T, mo=SC)->T should exist
-
The default fetch_min, fetch_max should have these semantics:
* Which value is stored in the atomic is unspecified if an input is a NaN
* Implementations should treat -0 as less than +0 (use normative encouragement)
* (Note: must not require std::min/max)
-
Users need a way to specify semantics (up to LEWG how to achieve this):
* Must-have: fminimum/fmaximum
* Must-have: fminimum_num/fmaximum_num
* Can have if LEWG desires it but we do not: std::min/max
Unanimous consent
- Revision 0: initial revision.
Introduction
P0493R4 Atomic minimum/maximum proposes the addition of atomic<T>::fetch_min
and atomic<T>::fetch_max
operations, which atomically perform std::min
and std::max
computations.
The Varna '23 plenary rejected these semantics for atomic<floating-point>
types due to safety concerns. These semantics deviate from the current standard practice, as defined by IEEE 754-2019 recommendations and the capabilities of available hardware.
These operations were removed from subsequent revisions of P0493. This paper proposes adding these operations to C++ with different semantics. It reviews the semantics proposed in P0493R4 and standard practice: IEEE 754-2008, IEEE 754-2019, and available hardware support. It then explores the design space and evaluates alternative solutions.
Finally, the authors propose two coherent alternatives and concrete wording changes implementing some of the semantics IEEE 754-2019 recommends all vendors implement.
Survey of programming language floating-point min/max API semantics
This section provides an overview of widely used floating-point min
/max
semantics. We will focus on how various min
/max
semantics handle the following "corner cases":
- Signed zero: Should
-0
be considered less than +0
(-0 < +0
), or should -0
and +0
be treated as equivalent (-0 == +0
) such that, e.g., min(+0, -0) = +0
?
- One quiet NaN (qNaN): when one of the arguments is a qNaN, should it be treated as Missing Data and the other argument returned, or should it be treated as an Error and propagated?
- Missing Data: returns the other argument:
min(x, qNaN) = x
and min(qNaN, x) = x
.
- Error: propagates the qNaN:
min(x, qNaN) = qNaN
and min(qNaN, x) = qNaN
.
Explicitly and intentionally, this paper ignores:
- Signaling NaNs: C23 specifies that their full support is not required, and where their specification is not provided leaving them as implementation-defined behavior (either as quiet NaN or signaling NaN). C++ does not specify anything about them beyond what C23 covers. An analysis of signaling NaNs and their impact on this proposal is left to a future version of this paper and captured as a current unresolved question in a latter section of this proposal.
- NaN payload propagation: IEEE 754 does not mandate it, and programming languages do not interpret NaN payloads.
Table 1 summarizes the semantics of the different C or C++ APIs regarding Signed Zeros and Quiet NaN handling and documents whether implementations that implement particular IEEE 754 semantics are valid implementations of the APIs.
C API |
C++ API |
Signed Zeros |
One qNaN |
IEEE 754 impl valid? |
- |
min
max |
Equivalent |
Undefined [1] |
Ternary [0] |
fmin
fmax |
- |
Equivalent or
-0 < +0 (QoI) [2] |
Missing Data |
minNum
maxNum or (QoI)
minimumNumber
maximumNumber |
fminimum
fmaximum |
- |
-0 < +0 |
Error |
minimum
maximum |
fminimum_num
fmaximum_num |
- |
-0 < +0 |
Missing Data |
minimumNumber
maximumNumber |
[0] Ternary semantics: min(x,y) = y < x? y : x
, max(x,y) = x < y? y : x
; both return the first argument if the arguments are equivalent or one argument is a NaN.
[1] In practice, the same implementation as for valid values is used, and the first argument passed to the function is returned (the second argument of the ternary expression is picked, since the conditional involving a NaN is always false).
[2] QoI: "Quality of Implementation", i.e., fmin
does not require -0 < +0
, but it recommends that high quality implementations implement it.
C++ std::min
/std::max
The std::min
and std::max
algorithms have a precondition on their arguments ([alg.min.max.1]):
Preconditions: For the first form, T
meets the Cpp17LessThanComparable requirements (Table Cpp17LessThanComparable).
which NaNs do not satisfy.
Rationale.
Per [structure.requirements#4] and [structure.requirements#8], which uses floating points as an example, the syntactic requirements apply to the type, but the semantic requirements only apply to the values actually passed to the algorithm.
The Cpp17LessThanComparable concept, which requires:
<
is a strict weak ordering relation ([alg.sorting])
Which is specified in [alg.sorting#general-4]:
The term strict refers to the requirement of an irreflexive relation (!comp(x, x)
for all x
), and the term weak to requirements that are not as strong as those for a total ordering, but stronger than those for a partial ordering. If we define equiv(a, b)
as !comp(a, b) && !comp(b, a)
, then the requirements are that comp
and equiv
both be transitive relations:
comp(a, b) && comp(b, c)
implies comp(a, c)
equiv(a, b)
&& equiv(b, c)
implies equiv(a, c)
[Note 1: Under these conditions, it can be shown that
3. equiv
is an equivalence relation,
4. comp
induces a well-defined relation on the equivalence classes determined by equiv
, and
5. the induced relation is a strict total ordering.
— end note]
and is satisfied by all floating-point values with the exception of NaNs.
For valid values, implementations follow ([alg.min.max]):
[std::min
]:
Returns: The smaller value. Returns the first argument when the arguments are equivalent.
[std::max
]:
Returns: The larger value. Returns the first argument when the arguments are equivalent.
which preserves equivalence between -0
and +0
(godbolt):
auto min = [](auto& a, auto& b) -> auto& { return std::min(x, y); };
auto max = [](auto& a, auto& b) -> auto& { return std::max(x, y); };
min(qNaN, 2.f);
max(qNaN, 2.f);
min(2.f, qNaN);
max(2.f, qNaN);
min(-0.f, +0.f);
max(-0.f, +0.f);
min(+0.f, -0.f);
max(+0.f, -0.f);
In all standard library implementations surveyed, the manifestation of the undefined behavior is to return the first argument; this is depicted in the example with "UB: qNaN" and "UB: 2".
The behavior of these semantics are:
- Signed Zero: undefined since not totally ordered.
- One qNaN: undefined since not totally ordered.
In concurrent programs, these semantics become even less intuitive, e.g., the sign of the result of a concurrent computation may depend on the order in which an observer sitting on top of the memory location observes the operations from different threads. If negative and positive zero are equivalent, the sign of this final result may differ across executions.
C fmin
/fmax
The semantics of the ISO/IEC 9899:2024 (C23) standard fmin
/fmax
functions are compatible with implementations of IEEE 754-2008 minNum
and maxNum
and of IEEE 754-2019 minimumNumber
/maximumNumber
. The fmin
/fmax
functions are available in C++ through the <cmath>
header as std::fmin/std::fmax
.
The semantics of C's fmin
/fmax
- Signed zero:
-0
may be equivalent to +0
(minNum
/maxNum
)
-0 < +0
allowed as QoI (minimumNumber
/maximumNumber
guarantee it).
- One qNaN: missing data (other value returned).
(collapsible) C23 fmin/fmax specification.
[From C23 7.12.12 Maximum, minimum, and positive difference functions]
[fmax 7.12.12.2]: The fmax
functions determine the maximum numeric value of their arguments. [Note 299]
[fmin 7.12.12.3]: The fmin
functions determine the minimum numeric value of their arguments. [Note 300]
[Note 299]: Quiet NaN arguments are treated as missing data: if one argument is a quiet NaN and the other numeric, then the fmax functions choose the numeric value. See F.10.9.2.
[Note 300]: The fmin
functions are analogous to the fmax
functions in their treatment of quiet NaNs.
[Note 461]: Ideally, fmax
would be sensitive to the sign of zero, for example fmax(−0.0, +0.0)
would return +0
; however, implementation in software might be impractical.
NOTE 1: The fmax
and fmin
functions are similar to the fmaximum_num and fminimum_num functions, though may differ in which signed zero is returned when the arguments are differently signed zeros and in their treatment of signaling NaNs (see F.10.9.5).
[F.10.9.2 The fmax functions]: If just one argument is a NaN, the fmax functions return the other argument (if both arguments are NaNs, the functions return a NaN).
[F.10.9.3 The fmin functions]: The fmin functions are analogous to the fmax functions (see F.10.9.2).
[F.2.1]: This annex does not require the full support for signaling NaNs specified in IEC 60559. This annex uses the term NaN, unless explicitly qualified, to denote quiet NaNs. Where specification of signaling NaNs is not provided, the behavior of signaling NaNs is implementation-defined (either treated as an IEC 60559 quiet NaN or treated as an IEC 60559 signaling NaN). 442) […] To support signaling NaNs as specified in IEC 60559, an implementation should adhere to the following recommended practice.
Recommended practice
Any floating-point operator or <math.h>
function or macro with a signaling NaN input, unless explicitly specified otherwise, raises an "invalid" floating-point exception.
[F.10.9.5] The fmaximum_num
, fminimum_num
, fmaximum_mag_num
, and fminimum_mag_num
functions
These functions return the number if one argument is a number and the other is a quiet or signaling NaN. If both arguments are NaNs, a quiet NaN is returned. If an argument is a signaling NaN, the "invalid" floating-point exception is raised (even though the function returns the number when the other argument is a number).
(collapsible) [IEEE 754-2008] minNum/maxNum specification.
[5.3.1 General operations]
sourceFormat minNum(source, source)
sourceFormat maxNum(source, source)
minNum(x, y)
is the canonicalized number x
if x<y
, y
if y<x
, the canonicalized number if one operand is a number and the other a quiet NaN. Otherwise it is either x
or y
, canonicalized (this means results might differ among implementations).
maxNum(x, y)
is the canonicalized number y
if x<y
, x
if y<x
, the canonicalized number if one operand is a number and the other a quiet NaN. Otherwise it is either x
or y
, canonicalized (this means results might differ among implementations).
That is, in the presence of a qNaN, these return the value, but signed zeros may or may not be equivalent:
fmin(qNaN, 2.f): 2
fmax(qNaN, 2.f): 2
fmin(2.f, qNaN): 2
fmax(2.f, qNaN): 2
fmin(-0.f, +0.f): -0 or +0
fmax(-0.f, +0.f): -0 or +0
fmin(+0.f, -0.f): -0 or +0
fmax(+0.f, -0.f): -0 or +0
IEEE 754-2019 removed minNum
andmaxNum
operations and replaced them with minimumNumber
/maximumNumber
and minimum
/maximum
. These are surveyed in the next section. The gist of the rationale for their removal from The Removal/Demotion of MinNum
and MaxNum
Operations from IEEE 754-2018 is:
These [minNum, maxNum] operations are removed from or demoted in IEEE std 754-2018, due to their non-associativity. […] With this non-associativity, different compilations or runs on parallel processing can return different answers […].
C23 minimum
/maximum
/minimumNumber
/maximumNumber
IEEE 754-2019 removed minNum
/maxNum
and recommends programming languages to provide their replacements instead: minimumNumber
/maximumNumber
/minimum
/maximum
.
(collapsible) IEEE 754-2019 minimum/maximum/minimumNumber/maximumNumber specification.
[9.6 Minimum and maximum operations]: Language standards should define the following homogeneous general-computational operations for all
supported arithmetic formats:
sourceFormat minimum(source, source)
sourceFormat minimumNumber(source, source)
sourceFormat maximum(source, source)
sourceFormat maximumNumber(source, source)
minimum(x, y)
is x
if x < y
, y
if y < x
, and a quiet NaN if either operand is a NaN, according to 6.2. For this operation, −0
compares less than +0
. Otherwise (i.e., when x=y
and signs are the same) it is either x
or y
.
minimumNumber(x, y)
is x
if x<y
, y
if y<x
, and the number if one operand is a number and the other is a NaN. For this operation, −0
compares less than +0
. If x = y
and signs are the same it is either x
or y
. If both operands are NaNs, a quiet NaN is returned, according to 6.2.
maximum(x, y)
is x
if x > y
, y
if y > x
, and a quiet NaN if either operand is a NaN, according to 6.2. For this operation, +0 compares greater than −0
. Otherwise (i.e., when x=y
and signs are the same) it is either x
or y
.
maximumNumber(x, y)
is x
if x>y
, y
if y>x
, and the number if one operand is a number and the other is a NaN. For this operation, +0
compares greater than −0
. If x = y
and signs are the same it is either x
or y
. If both operands are NaNs, a quiet NaN is returned, according to 6.2.
[6.2 Operations with NaNs]: For an operation with quiet NaN inputs, except as stated otherwise, if a floating-point result is to be delivered the result shall be a canonical quiet NaN.
(collapsible) C23 fmaximum/fminimum/fmaximum_num/fminimum_num specification.
[From C23 7.12.12 Maximum, minimum, and positive difference functions]
[fmaximum - 7.12.12.4]: The fmaximum
functions return the maximum value of their arguments.
The fmaximum
functions determine the maximum value of their arguments. For these functions, +0
is considered greater than −0
. These functions differ from the fmaximum_num
functions only in their treatment of NaN arguments (see F.10.9.4, F.10.9.5).
[fminimum - 7.12.12.5]: The fminimum
functions return the minimum value of their arguments.
The fminimum
functions determine the minimum value of their arguments. For these functions, −0
is considered less than +0
. These functions differ from the fminimum_num functions only in their treatment of NaN arguments (see F.10.9.4, F.10.9.5).
[fmaximum_num - 7.12.12.8]: The fmaximum_num
functions return the maximum value of their numeric arguments.
The fmaximum_num
functions determine the maximum value of their numeric arguments. They determine the number if one argument is a number and the other is a NaN. These functions differ from the fmaximum
functions only in their treatment of NaN arguments (see F.10.9.4, F.10.9.5).
[fminimum_num - 7.12.12.9]: The fminimum_num
functions return the minimum value of their numeric arguments.
The fminimum_num
functions determine the minimum value of their numeric arguments. They determine the number if one argument is a number and the other is a NaN. These functions differ from the fminimum functions only in their treatment of NaN arguments (see F.10.9.4, F.10.9.5).
[F.10.9.4]: These functions treat NaNs like other functions in <math.h> (see F.10).
[F.10 Mathematics <math.h> and <tgmath.h>]: Functions with a NaN argument return a NaN result and raise no floating-point exception, except
where explicitly stated otherwise.
The semantics of these functions matches for signed zero: -0 < +0
, and differs in their treatment when one argument is a qNaN:
- Missing Data:
fminimumNumber
/fmaximumNumber
, return the Number.
- Errors:
fminimum
/fmaximum
, propagate the qNaN.
fminimum(qNaN, 2.f): qNaN
fmaximum(qNaN, 2.f): qNaN
fminimum(2.f, qNaN): qNaN
fmaximum(2.f, qNaN): qNaN
fminimum(-0.f, +0.f): -0
fmaximum(-0.f, +0.f): +0
fminimum(+0.f, -0.f): -0
fmaximum(+0.f, -0.f): +0
fminimum_num(qNaN, 2.f): 2
fmaximum_num(qNaN, 2.f): 2
fminimum_num(2.f, qNaN): 2
fmaximum_num(2.f, qNaN): 2
fminimum_num(-0.f, +0.f): -0
fmaximum_num(-0.f, +0.f): +0
fminimum_num(+0.f, -0.f): -0
fmaximum_num(+0.f, -0.f): +0
Impact of replacing min
with fminimum_num
Table 2 captures the impact of replacing min
with fminimum_num
is as follows:
std::min /std::max |
fminimum_num /fmaximum_num |
min(qNaN, 2.f); // UB: qNaN
max(qNaN, 2.f); // UB: qNaN
min(+0.f, -0.f); // +0
max(-0.f, +0.f); // -0 |
fminimum_num(qNaN, 2.f); // 2
fmaximum_num(qNaN, 2.f); // 2
fminimum_num(+0.f, -0.f); // -0
fmaximum_num(-0.f, +0.f); // +0 |
That is, when the first input of std::min
/std::max
is a qNaN, then these switch from exhibiting undefined behavior to returning a number, and when signed zeros are involved, there is a case where the result has a different sign.
Survey of hardware atomic floating-point min/max API semantics
On systems without native support for atomic floating-point min/max operations, these operations must be performed atomically, e.g., using a CAS loop, a compare-and-conditional-store loop, an LL/SC loop, or, e.g., by taking a lock, which would make the atomic not lock-free. In these systems, the memory latency overhead may outweight the cost of performing the actual arithmetic portion of the min/max operations, and extra effort may be required to ensure forward progress properties like starvation freedom, and therefore those systems are not considered here.
Table 3 surveys the publicly known Instruction Set Architectures (ISAs) with atomic floating-point min/max operations, which are all GPU ISAs:
Vendor |
ISA |
Instructions |
IEEE-2019 compat |
Signed Zero |
Quiet NaN |
AMD |
CDNA2+ |
MIN MAX |
minimumNumber
maximumNumber |
-0 < +0 |
Missing Data |
Intel |
Xe ISA |
AOP_FMIN
AOP_FMAX [0] |
minimumNumber
maximumNumber |
-0 < +0 |
Missing Data |
NVIDIA |
PTX |
atom red |
minimumNumber
maximumNumber |
-0 < +0 |
Missing Data |
Neutral [1] |
SPIR-V Extension |
OpAtomicFMinEXT OpAtomicFMaxEXT |
C fmin /fmax |
Equivalent; QoI: -0 < +0 |
Missing Data |
- [0]: Volume 2d: Command Reference: Structures, page 229.
- [1]: The Vulkan extension corresponding to this SPIR-V extension is
VK_EXT_shader_atomic_float2
. Hardware that implements it is listed here and here.
All the architectures surveyed order -0 < +0
, treat qNaNs as missing data, and implement IEEE 754-2019 minimumNumber
and maximumNumber
semantics. The SPIR-V extension requires C fmin
/fmax
semantics, which allows -0 < +0
but does not require it.
The semantics vendor implement are compatible with C23's fminimum_num
/fmaximum_num
, and C's fmin
/fmax
, but not with C++'s std::min
/std::max
due to the different outcomes when signed-zeros are involved.
We compare the performance impact of two different atomic<floating-point>::fetch_max
semantics:
std::min
, which lowers to a CAS-loop that performs std::min
(similar for compare-and-conditional-store), and
fminimum_num
, which lowers to native atomic operations,
using a synthetic micro-benchmark that measures throughput in Giga Operations per Second (y-axis; logarithmic) as a function of the number of threads (x-axis; logarithmic) from 32 to 130'000 hardware threads on an .
Modern concurrent systems provide dozens of millions of hardware threads operating on the same shared memory.
While native in-memory atomic operations increase throughput until its theoretical peak with just a few thousand threads, the performance of compare and swap strategies decreases as the number of threads increases due to excess contention. At just a few thousand threads, the performance is already four orders of magnitude worse (~10'000x) than that of native in-memory operations. This is why vendors of highly concurrent systems provide these operations.
While most CPU ISAs lack native hardware instructions for atomic floating-point min/max, their performance impact is expected to be similar to that of atomic integer min/max. Section 9 of P0493R4 presents benchmarks comparing the performance of a CAS-based implementation against an Arm v8.1 implementation using the native atomic ldsmaxl
instruction. The benchmark covers a range of cores, from 2 to 64. The performance improvement of the native instruction ranges between 2.75x (2 cores) to 1.7x at 64 cores. We expect this behavior to transfer to atomic floating-point min/max.
We, unfortunately, do not have benchmarks for other hardware configurations at this time.
Design space
SG1 (Concurrency) and SG6 (Numerics) agreed on the following semantics (see polls in the Changelog of Revision's 1 and 2 of this paper):
- The "default"
fetch_min(T, mo=SC)->T
and fetch_max(T, mo=SC)->T
APIs should exist and have these semantics:
- Which value is stored in the atomic is unspecified if an input is a NaN.
- Implementations should treat
-0
as less than +0
(normative encouragement; must not require std::min/max).
- No explicit specification of signaling NaN behavior.
- Users need a way to specify
fminimum
/fmaximum
and fminimum_num
/fmaximum_num
semantics. These APIs should be specified in term of the corresponding C23 APIs.
- Can have APIs with
std::min
/std::max
semantics that leave the handling of signed zero and NaNs as undefined, but SG1 and SG6 do not want these APIs.
These semantics are different from std::min
/std::max
semantics, which leave the handling of signed-zero and NaNs as undefined behavior, due to these values not being totally ordered. There is precendent in the C++ standard for the atomic operations semantics to deviate from the non-atomic semantics. For example, atomic<int>::fetch_add
wraps around on overflow, instead of exhibiting undefined behavior, see [atomics#ref.int-6]:
Remarks: For signed integer types, the result is as if the object value and parameters were converted to their corresponding unsigned types, the computation performed on those types, and the result converted back to the signed type.
[Note 2: There are no undefined results arising from the computation. — end note]
The wording section proposes the actual API.
Wording
Add the following to [atomics.ref.float] immediately below fetch_sub
:
namespace std {
template <> struct atomic_ref<floating-point> {
floating-point fetch_max(floating-point, memory_order = memory_order::seq_cst) const noexcept;
floating-point fetch_min(floating-point, memory_order = memory_order::seq_cst) const noexcept;
floating-point fetch_fmaximum(floating-point, memory_order = memory_order::seq_cst) const noexcept;
floating-point fetch_fmininum(floating-point, memory_order = memory_order::seq_cst) const noexcept;
floating-point fetch_fmaximum_num(floating-point, memory_order = memory_order::seq_cst) const noexcept;
floating-point fetch_fmininum_num(floating-point, memory_order = memory_order::seq_cst) const noexcept;
};
}
floating-point-type fetch_key(floating-point-type operand,
memory_order order = memory_order::seq_cst) const noexcept;
- Effects: Atomically replaces the value referenced by
*ptr
with the result of the computation applied to the value referenced by *ptr
and the given operand. Memory is affected according to the value of order. These operations are atomic read-modify-write operations ([intro.races]).
- Returns: Atomically, the value referenced by *ptr immediately before the effects.
- Remarks: If the result is not a representable value for its type ([expr.pre]), the result is unspecified, but the operations otherwise have no undefined behavior. Atomic arithmetic operations on floating-point-type should conform to the
std::numeric_limits<floating-point-type>
traits associated with the floating-point type ([limits.syn]). The floating-point environment ([cfenv]) for atomic arithmetic operations on floating-point-type may be different than the calling thread's floating-point environment.
- Remarks:
- For
fetch_fmaximum
and fetch_fminimum
, the maximum and minimum computation is performed as if by std::fmaximum
and std::fminimum
, with the object value and the first parameter as the arguments.
- For
fetch_fmaximum_num
and fetch_fminimum_num
, the maximum and minimum computation is performed as if by std::fmaximum_num
and std::fminimum_num
, with the object value and the first parameter as the arguments.
- For
fetch_max
and fetch_min
, the maximum and minimum computation is performed as if by std::fmaximum_num
and std::fminimum_num
, with the object value and the first parameter as the arguments. If the given operand or the value referenced by *ptr
are NaN, which of these values is stored at *ptr
is unspecified. If the given operand and the value referenced by *ptr
are differently signed zeros, which of these values is stored at *ptr
is unspecified.
- Recommended practice: The implementation of
fetch_max
and fetch_min
should treat negative zero as smaller than positive zero.
Add the following to [atomics.types.float] immediately below fetch_sub
:
namespace std {
template <> struct atomic<floating-point> {
floating-point fetch_max(floating-point, memory_order = memory_order::seq_cst) volatile noexcept;
floating-point fetch_max(floating-point, memory_order = memory_order::seq_cst) noexcept;
floating-point fetch_min(floating-point, memory_order = memory_order::seq_cst) volatile noexcept;
floating-point fetch_min(floating-point, memory_order = memory_order::seq_cst) noexcept;
floating-point fetch_fmaximum(floating-point, memory_order = memory_order::seq_cst) volatile noexcept;
floating-point fetch_fmaximum(floating-point, memory_order = memory_order::seq_cst) noexcept;
floating-point fetch_fmininum(floating-point, memory_order = memory_order::seq_cst) volatile noexcept;
floating-point fetch_fmininum(floating-point, memory_order = memory_order::seq_cst) noexcept;
floating-point fetch_fmaximum_num(floating-point, memory_order = memory_order::seq_cst) volatile noexcept;
floating-point fetch_fmaximum_num(floating-point, memory_order = memory_order::seq_cst) noexcept;
floating-point fetch_fmininum_num(floating-point, memory_order = memory_order::seq_cst) volatile noexcept;
floating-point fetch_fmininum_num(floating-point, memory_order = memory_order::seq_cst) noexcept;
};
}
T fetch_key(T operand, memory_order order = memory_order::seq_cst) volatile noexcept;
T fetch_key(T operand, memory_order order = memory_order::seq_cst) noexcept;
- Constraints: For the volatile overload of this function, is_always_lock_free is true.
- Effects: Atomically replaces the value pointed to by this with the result of the computation applied to the value pointed to by this and the given operand. Memory is affected according to the value of order. These operations are atomic read-modify-write operations ([intro.multithread]).
- Returns: Atomically, the value pointed to by this immediately before the effects.
- Remarks: If the result is not a representable value for its type ([expr.pre]) the result is unspecified, but the operations otherwise have no undefined behavior. Atomic arithmetic operations on floating-point-type should conform to the std::numeric_limits<floating-point-type> traits associated with the floating-point type ([limits.syn]). The floating-point environment ([cfenv]) for atomic arithmetic operations on floating-point-type may be different than the calling thread's floating-point environment.
- Remarks:
- For
fetch_fmaximum
and fetch_fmininimum
, the maximum and minimum computation is performed as if by std::fmaximum
and std::fminimum
, with the object value and the first parameter as the arguments.
- For
fetch_fmaximum_num
and fetch_fmininimum_num
, the maximum and minimum computation is performed as if by std::fmaximum_num
and std::fminimum_num
, with the object value and the first parameter as the arguments.
- For
fetch_max
and fetch_min
, the maximum and minimum computation is performed as if by std::fmaximum_num
and std::fminimum_num
, with the object value and the first parameter as the arguments. If the given operand or the object value are NaN, which of these values replaces the object value is unspecified. If the given operand and the object value are differently signed zeros, which of these values replaces the object value is unspecified.
- Recommended practice: The implementation of
fetch_max
and fetch_min
should treat negative zero as smaller than positive zero.
Update __cpp_lib_atomic_min_max
version macro in <version>
synopsis [version.syn] to the C++ version this feature is introduced in:
#define __cpp_lib_atomic_min_max 202403L______L // freestanding, also in <atomic>
EDITORIAL NOTES:
- This paper relies on the C23 math functions which have not been merged into C++26 draft yet (see P3384).
References
- P0493R4 Atomic maximum/minimum.
- The Removal/Demotion of MinNum and MaxNum Operations from IEEE 754™-2018.
- Min-max functions Proposal for TS 18661 update WG14 N2273.
- ISO/IEC 9899:2024 C23 standard.
- ANSI/IEEE Std 754-2008.
- ANSI/IEEE Std 754-2019.
Atomic floating-point min/max
Document number: P3008R3.
Date: 2024-11-18.
Authors: Gonzalo Brito Gadeschi, David Sankel <dsankel@adobe.com>.
Reply to: gonzalob at nvidia.com .
Audience: LEWG.
Table of Contents
Changelog
Poll 1: Accept SG1 proposal as listed in P3008R1 with "implementation-defined" instead of "unspecified" and forward to LEWG.
Consensus.
Poll 2: If one of the input is a NaN then it is "unspecified" whether the result value is either a NaN or the other value.
Consensus.
SA: bad for teachability.
Poll 3: With respect to signaling NaNs, SG6 recommends:
fetch_fminimum
/fminimum_num
/fmaximum
/fmaximum_num
to be specified in terms of their corresponding C23 APIs (ignore signaling NaNs in the wording, refer to C23), andConsensus.
The default fetch_min(T, mo=SC)->T, fetch_max(T, mo=SC)->T should exist
The default fetch_min, fetch_max should have these semantics:
* Which value is stored in the atomic is unspecified if an input is a NaN
* Implementations should treat -0 as less than +0 (use normative encouragement)
* (Note: must not require std::min/max)
Users need a way to specify semantics (up to LEWG how to achieve this):
* Must-have: fminimum/fmaximum
* Must-have: fminimum_num/fmaximum_num
* Can have if LEWG desires it but we do not: std::min/max
Unanimous consent
Introduction
P0493R4 Atomic minimum/maximum proposes the addition of
atomic<T>::fetch_min
andatomic<T>::fetch_max
operations, which atomically performstd::min
andstd::max
computations.The Varna '23 plenary rejected these semantics for
atomic<floating-point>
types due to safety concerns. These semantics deviate from the current standard practice, as defined by IEEE 754-2019 recommendations and the capabilities of available hardware.These operations were removed from subsequent revisions of P0493. This paper proposes adding these operations to C++ with different semantics. It reviews the semantics proposed in P0493R4 and standard practice: IEEE 754-2008, IEEE 754-2019, and available hardware support. It then explores the design space and evaluates alternative solutions.
Finally, the authors propose two coherent alternatives and concrete wording changes implementing some of the semantics IEEE 754-2019 recommends all vendors implement.
Survey of programming language floating-point min/max API semantics
This section provides an overview of widely used floating-point
min
/max
semantics. We will focus on how variousmin
/max
semantics handle the following "corner cases":-0
be considered less than+0
(-0 < +0
), or should-0
and+0
be treated as equivalent (-0 == +0
) such that, e.g.,min(+0, -0) = +0
?min(x, qNaN) = x
andmin(qNaN, x) = x
.min(x, qNaN) = qNaN
andmin(qNaN, x) = qNaN
.Explicitly and intentionally, this paper ignores:
Table 1 summarizes the semantics of the different C or C++ APIs regarding Signed Zeros and Quiet NaN handling and documents whether implementations that implement particular IEEE 754 semantics are valid implementations of the APIs.
min
max
[1]
fmin
fmax
or
-0 < +0
(QoI) [2]minNum
maxNum
or (QoI)
minimumNumber
maximumNumber
fminimum
fmaximum
-0 < +0
minimum
maximum
fminimum_num
fmaximum_num
-0 < +0
minimumNumber
maximumNumber
[0] Ternary semantics:
min(x,y) = y < x? y : x
,max(x,y) = x < y? y : x
; both return the first argument if the arguments are equivalent or one argument is a NaN.[1] In practice, the same implementation as for valid values is used, and the first argument passed to the function is returned (the second argument of the ternary expression is picked, since the conditional involving a NaN is always false).
[2] QoI: "Quality of Implementation", i.e.,
fmin
does not require-0 < +0
, but it recommends that high quality implementations implement it.C++
std::min
/std::max
The
std::min
andstd::max
algorithms have a precondition on their arguments ([alg.min.max.1]):which NaNs do not satisfy.
Rationale.
Per [structure.requirements#4] and [structure.requirements#8], which uses floating points as an example, the syntactic requirements apply to the type, but the semantic requirements only apply to the values actually passed to the algorithm.
The Cpp17LessThanComparable concept, which requires:
Which is specified in [alg.sorting#general-4]:
and is satisfied by all floating-point values with the exception of NaNs.
For valid values, implementations follow ([alg.min.max]):
which preserves equivalence between
-0
and+0
(godbolt):In all standard library implementations surveyed, the manifestation of the undefined behavior is to return the first argument; this is depicted in the example with "UB: qNaN" and "UB: 2".
The behavior of these semantics are:
In concurrent programs, these semantics become even less intuitive, e.g., the sign of the result of a concurrent computation may depend on the order in which an observer sitting on top of the memory location observes the operations from different threads. If negative and positive zero are equivalent, the sign of this final result may differ across executions.
C
fmin
/fmax
The semantics of the ISO/IEC 9899:2024 (C23) standard
fmin
/fmax
functions are compatible with implementations of IEEE 754-2008minNum
andmaxNum
and of IEEE 754-2019minimumNumber
/maximumNumber
. Thefmin
/fmax
functions are available in C++ through the<cmath>
header asstd::fmin/std::fmax
.The semantics of C's
fmin
/fmax
-0
may be equivalent to+0
(minNum
/maxNum
)-0 < +0
allowed as QoI (minimumNumber
/maximumNumber
guarantee it).(collapsible) C23 fmin/fmax specification.
(collapsible) [IEEE 754-2008] minNum/maxNum specification.
That is, in the presence of a qNaN, these return the value, but signed zeros may or may not be equivalent:
IEEE 754-2019 removed
minNum
andmaxNum
operations and replaced them withminimumNumber
/maximumNumber
andminimum
/maximum
. These are surveyed in the next section. The gist of the rationale for their removal from The Removal/Demotion ofMinNum
andMaxNum
Operations from IEEE 754-2018 is:C23
minimum
/maximum
/minimumNumber
/maximumNumber
IEEE 754-2019 removed
minNum
/maxNum
and recommends programming languages to provide their replacements instead:minimumNumber
/maximumNumber
/minimum
/maximum
.(collapsible) IEEE 754-2019 minimum/maximum/minimumNumber/maximumNumber specification.
(collapsible) C23 fmaximum/fminimum/fmaximum_num/fminimum_num specification.
The semantics of these functions matches for signed zero:
-0 < +0
, and differs in their treatment when one argument is a qNaN:fminimumNumber
/fmaximumNumber
, return the Number.fminimum
/fmaximum
, propagate the qNaN.Impact of replacing
min
withfminimum_num
Table 2 captures the impact of replacing
min
withfminimum_num
is as follows:std::min
/std::max
fminimum_num
/fmaximum_num
min(qNaN, 2.f); // UB: qNaN
max(qNaN, 2.f); // UB: qNaN
min(+0.f, -0.f); // +0
max(-0.f, +0.f); // -0
fminimum_num(qNaN, 2.f); // 2
fmaximum_num(qNaN, 2.f); // 2
fminimum_num(+0.f, -0.f); // -0
fmaximum_num(-0.f, +0.f); // +0
That is, when the first input of
std::min
/std::max
is a qNaN, then these switch from exhibiting undefined behavior to returning a number, and when signed zeros are involved, there is a case where the result has a different sign.Survey of hardware atomic floating-point min/max API semantics
On systems without native support for atomic floating-point min/max operations, these operations must be performed atomically, e.g., using a CAS loop, a compare-and-conditional-store loop, an LL/SC loop, or, e.g., by taking a lock, which would make the atomic not lock-free. In these systems, the memory latency overhead may outweight the cost of performing the actual arithmetic portion of the min/max operations, and extra effort may be required to ensure forward progress properties like starvation freedom, and therefore those systems are not considered here.
Table 3 surveys the publicly known Instruction Set Architectures (ISAs) with atomic floating-point min/max operations, which are all GPU ISAs:
MAX
minimumNumber
maximumNumber
AOP_FMIN
AOP_FMAX
[0]minimumNumber
maximumNumber
red
minimumNumber
maximumNumber
OpAtomicFMaxEXT
fmin
/fmax
QoI: -0 < +0
VK_EXT_shader_atomic_float2
. Hardware that implements it is listed here and here.All the architectures surveyed order
-0 < +0
, treat qNaNs as missing data, and implement IEEE 754-2019minimumNumber
andmaximumNumber
semantics. The SPIR-V extension requires Cfmin
/fmax
semantics, which allows-0 < +0
but does not require it.The semantics vendor implement are compatible with C23's
fminimum_num
/fmaximum_num
, and C'sfmin
/fmax
, but not with C++'sstd::min
/std::max
due to the different outcomes when signed-zeros are involved.Performance impact of
atomic<floating-point>::fetch_min
/_max
semanticsWe compare the performance impact of two different
atomic<floating-point>::fetch_max
semantics:std::min
, which lowers to a CAS-loop that performsstd::min
(similar for compare-and-conditional-store), andfminimum_num
, which lowers to native atomic operations,using a synthetic micro-benchmark that measures throughput in Giga Operations per Second (y-axis; logarithmic) as a function of the number of threads (x-axis; logarithmic) from 32 to 130'000 hardware threads on an NVIDIA GPU system.
Modern concurrent systems provide dozens of millions of hardware threads operating on the same shared memory.
While native in-memory atomic operations increase throughput until its theoretical peak with just a few thousand threads, the performance of compare and swap strategies decreases as the number of threads increases due to excess contention. At just a few thousand threads, the performance is already four orders of magnitude worse (~10'000x) than that of native in-memory operations. This is why vendors of highly concurrent systems provide these operations.
While most CPU ISAs lack native hardware instructions for atomic floating-point min/max, their performance impact is expected to be similar to that of atomic integer min/max. Section 9 of P0493R4 presents benchmarks comparing the performance of a CAS-based implementation against an Arm v8.1 implementation using the native atomic
ldsmaxl
instruction. The benchmark covers a range of cores, from 2 to 64. The performance improvement of the native instruction ranges between 2.75x (2 cores) to 1.7x at 64 cores. We expect this behavior to transfer to atomic floating-point min/max.We, unfortunately, do not have benchmarks for other hardware configurations at this time.
Design space
SG1 (Concurrency) and SG6 (Numerics) agreed on the following semantics (see polls in the Changelog of Revision's 1 and 2 of this paper):
fetch_min(T, mo=SC)->T
andfetch_max(T, mo=SC)->T
APIs should exist and have these semantics:-0
as less than+0
(normative encouragement; must not require std::min/max).fminimum
/fmaximum
andfminimum_num
/fmaximum_num
semantics. These APIs should be specified in term of the corresponding C23 APIs.std::min
/std::max
semantics that leave the handling of signed zero and NaNs as undefined, but SG1 and SG6 do not want these APIs.These semantics are different from
std::min
/std::max
semantics, which leave the handling of signed-zero and NaNs as undefined behavior, due to these values not being totally ordered. There is precendent in the C++ standard for the atomic operations semantics to deviate from the non-atomic semantics. For example,atomic<int>::fetch_add
wraps around on overflow, instead of exhibiting undefined behavior, see [atomics#ref.int-6]:The wording section proposes the actual API.
Wording
Add the following to [atomics.ref.float] immediately below
fetch_sub
:*ptr
with the result of the computation applied to the value referenced by*ptr
and the given operand. Memory is affected according to the value of order. These operations are atomic read-modify-write operations ([intro.races]).std::numeric_limits<floating-point-type>
traits associated with the floating-point type ([limits.syn]). The floating-point environment ([cfenv]) for atomic arithmetic operations on floating-point-type may be different than the calling thread's floating-point environment.fetch_fmaximum
andfetch_fminimum
, the maximum and minimum computation is performed as if bystd::fmaximum
andstd::fminimum
, with the object value and the first parameter as the arguments.fetch_fmaximum_num
andfetch_fminimum_num
, the maximum and minimum computation is performed as if bystd::fmaximum_num
andstd::fminimum_num
, with the object value and the first parameter as the arguments.fetch_max
andfetch_min
, the maximum and minimum computation is performed as if bystd::fmaximum_num
andstd::fminimum_num
, with the object value and the first parameter as the arguments. If the given operand or the value referenced by*ptr
are NaN, which of these values is stored at*ptr
is unspecified. If the given operand and the value referenced by*ptr
are differently signed zeros, which of these values is stored at*ptr
is unspecified.fetch_max
andfetch_min
should treat negative zero as smaller than positive zero.Add the following to [atomics.types.float] immediately below
fetch_sub
:fetch_fmaximum
andfetch_fmininimum
, the maximum and minimum computation is performed as if bystd::fmaximum
andstd::fminimum
, with the object value and the first parameter as the arguments.fetch_fmaximum_num
andfetch_fmininimum_num
, the maximum and minimum computation is performed as if bystd::fmaximum_num
andstd::fminimum_num
, with the object value and the first parameter as the arguments.fetch_max
andfetch_min
, the maximum and minimum computation is performed as if bystd::fmaximum_num
andstd::fminimum_num
, with the object value and the first parameter as the arguments. If the given operand or the object value are NaN, which of these values replaces the object value is unspecified. If the given operand and the object value are differently signed zeros, which of these values replaces the object value is unspecified.fetch_max
andfetch_min
should treat negative zero as smaller than positive zero.Update
__cpp_lib_atomic_min_max
version macro in<version>
synopsis [version.syn] to the C++ version this feature is introduced in:EDITORIAL NOTES:
References