<a id="top">Defect Report Summary for FPE 18661</a>

Defect Report Summary for FPE 18661
Version 1.1

Date: October 2016

Defect	Summary	Date	Status
DR 1	P1: Typos	10/2016	Review
DR 2	P1: Functions that round result to narrower type don't always	10/2016	Review
DR 3	P1: feature macros and header file inclusions	10/2016	Review
DR 4	P3: Error in function name	10/2016	Review
DR 5	P1: Is return of same type convertFormat or copy?	10/2016	Open
DR 6	P1: fetestexceptflag and exceptions passed to fegetexceptflag	10/2016	Open
DR 7	P1: Editorial changes	10/2016	Open
DR 8	P2: Editorial clarification about number digits in the coefficient	10/2016	Open
DR 9	P3: Missing specification for usual arithmetic conversions, tgmath	10/2016	Open
DR 10	P1: wrong type for fesetmode parameter	10/2016	Open
DR 11	P2: a-style formatting not IEC 60559 conformant	10/2016	Open

DR 1

DR 4 Prev <— Review —> Next DR 2, or summary at top

Submitter: Jim Thomas et al.
Submission Date: 2016-03-19
Source: WG14
Reference Document: N2029
Subject: Part 1: Typos

Summary

Page 18: In C 7.6.1a#4, the last sentence, “functon” should be “function”.
Page 48: In C 7.6.2.4a#3, “The fetestexcept function returns ...” should be “The fetestexceptflag function returns ...”.

Suggested Technical Corrigendum

Page 18: In C 7.6.1a, paragraph 4, the last sentence, change “functon” to “function”
Page 48: In C 7.6.2.4a#3, change “fetestexcept” to “fetestexceptflag”.

Apr 2016 meeting

Committee Discussion

The committee agrees that this is a defect and accepts the Suggested Technical Corrigendum

Proposed Technical Corrigendum

Page 18: In C 7.6.1a, paragraph 4, the last sentence, change “functon” to “function”
Page 48: In C 7.6.2.4a#3, change “fetestexcept” to “fetestexceptflag”.

DR 4 Prev <— Review —> Next DR 2, or summary at top

DR 2

DR 1 Prev <— Review —> Next DR 3, or summary at top

Submitter: Jim Thomas et al.
Submission Date: 2016-03-19
Source: WG14
Reference Document: N2029
Subject: Part 1: Functions that round result to narrower type don't always

Summary

Page 38: The C 7.12.13a subclause heading is “Functions that round result to narrower type” and this is the way the functions in the subclause are referred to throughout the TS. In some cases, the functions in the subclause round their result to a type that isn’t really narrower than the parameter types. For example, this is true for the functions daddl, dsubl, etc. if the long double and double types have the same width (as is allowed). (With the extended types introduced in TS 18661-3, the destination type might be wider, as it might for f32xaddf64.)

The current way of referencing these functions reflects the usual situation, and is perhaps a helpful way of think about them generally. With a note about the uncharacteristic cases, it seems unlike to cause significant confusion. Also, changing all the references to these functions would be a large editorial undertaking, spanning multiple parts of the TS. Confusion could easily arise from having an inconsistent set of documents.

Suggested Technical Corrigendum

Page 38: After the C 7.12.13a subclause heading, insert the following paragraph:

[1] The functions in this subclause round their results to a type typically narrower than the parameter types.

Page 40: After the change to C ending with “7.12.13a.6 Square root rounded to narrower type ... [3] These functions return the square root of x, rounded to the type of the function.”, insert the following:

In 7.12.13a #1, attach a footnote to the wording:
typically narrower
where the footnote is:
*) In some cases the destination type might not be narrower than the parameter types. For example, double might not be narrower than long double.

Apr 2016 meeting

Committee Discussion

The committee agrees that this is a defect and accepts the Suggested Technical Corrigendum

Proposed Technical Corrigendum

Page 38: After the C 7.12.13a subclause heading, insert the following paragraph:

[1] The functions in this subclause round their results to a type typically narrower than the parameter types.

In 7.12.13a #1, attach a footnote to the wording:
typically narrower
where the footnote is:
*) In some cases the destination type might not be narrower than the parameter types. For example, double might not be narrower than long double.

DR 1 Prev <— Review —> Next DR 3, or summary at top

DR 3

DR 2 Prev <— Review —> Next DR 4, or summary at top

Submitter: Jim Thomas et al.
Submission Date: 2016-03-19
Source: WG14
Reference Document: N2029
Subject: Part 1: feature macros and header file inclusions/p>

Summary

ISO/IEC TS 18661-1 subclause 5.3 specifies interfaces that are defined or declared “only if __STDC_WANT_IEC_60559_BFP_EXT__ is defined as a macro at the point in the source file where the header for the interface is first included.” C 7.12#1 says <tgmath.h> includes <math.h> and <complex.h>.

So for


#include <math.h>
#define __STDC_WANT_IEC_60559_BFP_EXT__ 
#include <tgmath.h>
float f(float x) { return nextup(x); }

the nextup functions in <math.h> are not declared and the nextup macro in <tgmath.h> is defined. Since x has type float, the function determined by the nextup macro in <tgmath.h> is nextupf. But is this function available to be called? Another example. For


#include <limits.h>
#define __STDC_WANT_IEC_60559_BFP_EXT__ 
#include <math.h>
...

the fromfp functions in <math.h> are declared, but the WIDTH macros in <limits.h>, which are needed for portable use of the fromfp functions, are not defined. In these examples, interfaces provided by one header are related to interfaces that are not provided by another header, because of the placement of the WANT macros. This leads to ambiguous cases (as in the first example above) and incomplete feature sets. Later parts of the TS have their own WANT macros, which compounds the problem. See also Joseph Myers’s .

The suggested corrigendum below specifies that the same set of WANT macros must be defined at the points in the code where the relevant headers are first included. This results in fewer combinations of interfaces and provides one sets of interfaces that is consistent and complete with respect to a given set of WANT macros.

Suggested Technical Corrigendum

Page 5: At the end of 5.3, insert:

After 7.1.2#4, insert:
[4a] Some standard headers define or declare identifiers contingent on whether certain macros whose names begin with _STDC_WANT_IEC_60559_ and end with _EXT_ are defined (by the user) at the point in the code where the header is first included. Within a preprocessing translation unit, the same set of such macros shall be defined for the first inclusion of all such headers.

Apr 2016 meeting

Committee Discussion

The committee agrees that this is a defect and accepts the Suggested Technical Corrigendum

Proposed Technical Corrigendum

Page 5: At the end of 5.3, insert:

After 7.1.2#4, insert:
[4a] Some standard headers define or declare identifiers contingent on whether certain macros whose names begin with _STDC_WANT_IEC_60559_ and end with _EXT_ are defined (by the user) at the point in the code where the header is first included. Within a preprocessing translation unit, the same set of such macros shall be defined for the first inclusion of all such headers.

DR 2 Prev <— Review —> Next DR 4, or summary at top

DR 4

DR 3 Prev <— Review —> Next DR 1, or summary at top

Submitter: Jim Thomas et al.
Submission Date: 2016-03-19
Source: WG14
Reference Document: N2029
Subject: Part 3: Error in function name/p>

Summary

Page 32: In 12.3, the function name is written as “scoshdNx”, instead of “coshdNx” as intended. Although correcting the mistake could be seen as a substantive change, it is clear from the context that this function is in the family of cosh functions. It is extremely unlikely that any implementer would not have recognized the mistake and provided the function with the erroneous name.

Suggested Technical Corrigendum

Page 32: In 12.3, change “scoshdNx” to “coshdNx”.

Apr 2016 meeting

Committee Discussion

The committee agrees that this is a defect and accepts the Suggested Technical Corrigendum

Proposed Technical Corrigendum

Page 32: In 12.3, change “scoshdNx” to “coshdNx”.

DR 3 Prev <— Review —> Next DR 1, or summary at top

DR 5

DR 11 Prev <— Open —> Next DR 6, or summary at top

Submitter: Jim Thomas
Submission Date: 2016-09-10
Source: WG14
Reference Document: N2077
Subject: Part 1: Is return of same type convertFormat or copy?

Summary

This is about the issue raised by Joseph Myers in email SC22WG14.14280:

TS 18661-1 says "Whether C assignment (6.5.16) (and conversion as if by assignment) to the same format is an IEC 60559 convertFormat or copy operation is implementation-defined, even if <fenv.h> defines the macro

FE_SNANS_ALWAYS_SIGNAL (F.2.1).".

Does this apply to function return, where the return type of the function is the same as the type of the expression passed to the return statement and no wider evaluation format is in use - that is, may this act as either convertFormat or copy? C11 F.6 clearly envisages that such a return statement may do a conversion to the same type in the case of wider evaluation formats. But 6.8.6.4#3 only refers to conversions "If the expression has a type different from the return type of the function in which it appears".

The specification, from F.3#3, quoted above is incomplete in that it doesn’t cover function returns, which are not assignments or conversions as if by assignment. As currently written, C11 + TS18661-1 might be read to exclude the possibility of using convertFormat in this case. A statement should be added to say that the implementation has the option to apply convertFormat to the return value. The change does not break existing implementations.

The effect of convertFormat would be that signaling NaNs would signal and noncanonical representations would be canonicalized. It is extremely unlikely that a program would depend on convertFormat not being used.

Suggested Technical Corrigendum

In Clause 8, to the text for C F.3#3:

[3] Whether C assignment (6.5.16) (and conversion as if by assignment) to the same format is an IEC 60559 convertFormat or copy operation is implementation-defined, even if <fenv.h> defines the macro FE_SNANS_ALWAYS_SIGNAL (F.2.1).

append the sentence:

If the return expression of a return statement is evaluated to the floating-point format of the return type, it is implementation-defined whether a convertFormat operation is applied to the result of the return expression.”

At the end of Clause 8, add:

In F.3#3, attach a footnote to the wording:

Whether C assignment (6.5.16) (and conversion as if by assignment) to the same format is an IEC 60559 convertFormat or copy operation

where the footnote is:

*) Where the source and destination formats are the same, convertFormat operations differ from copy operations in that convertFormat operations raise the “invalid” floating-point exception on signaling NaN inputs and do not propagate non-canonical encodings.

Oct 2016 meeting

Committee Discussion

The committee agrees that this is a defect and accepts the Suggested Technical Corrigendum

Proposed Technical Corrigendum

In Clause 8, to the text for C F.3#3:

append the sentence:

At the end of Clause 8, add:

In F.3#3, attach a footnote to the wording:

Whether C assignment (6.5.16) (and conversion as if by assignment) to the same format is an IEC 60559 convertFormat or copy operation

where the footnote is:

DR 11 Prev <— Open —> Next DR 6, or summary at top

DR 6

DR 5 Prev <— Open —> Next DR 7, or summary at top

Submitter: Jim Thomas
Submission Date: 2016-09-10
Source: WG14
Reference Document: N2077
Subject: Part 1: fetestexceptflag and exceptions passed to fegetexceptflag

Summary

This is about the issue raised by Joseph Myers in email SC22WG14.14328:

TS 18661-1 says, for fetestexceptflag, "The value of *flagp shall have been set by a previous call to fegetexceptflag.".

This contrasts with the C11 wording for fesetexceptflag, "The value of *flagp shall have been set by a previous call to fegetexceptflag whose second argument represented at least those floating-point exceptions represented by the argument excepts.". So what happens if more exceptions are specified in the call to fetestexceptflag than were specified in the call to fegetexceptflag? Then fegetexceptflag may or may not have stored any meaningful representation of the state of the extra exceptions being tested.

I think fetestexceptflag should have the same wording for this issue as fesetexceptflag: "whose second argument represented at least those floating-point exceptions represented by the argument excepts".

fesetexceptflag sets global state, typically a hardware register, whereas fetestexceptflag just reads a variable. It seems more important to avoid spurious data in the former.

Still, there’s no utility in testing spurious flag settings, and placing the same restrictions on fetestexceptflag as on fesetexceptflag might be less error prone.

Suggested Technical Corrigendum

In 15.2, in the new text for C 7.6.2.4a#2, change:

The value of *flagp shall have been set by a previous call to fegetexceptflag.

to:

The value of *flagp shall have been set by a previous call to fegetexceptflag whose second argument represented at least those floating-point exceptions represented by the argument excepts.

Oct 2016 meeting

Committee Discussion

The committee agrees that this is a defect and accepts the Suggested Technical Corrigendum

Proposed Technical Corrigendum

In 15.2, in the new text for C 7.6.2.4a#2, change:

The value of *flagp shall have been set by a previous call to fegetexceptflag.

to:

The value of *flagp shall have been set by a previous call to fegetexceptflag whose second argument represented at least those floating-point exceptions represented by the argument excepts.

DR 5 Prev <— Open —> Next DR 7, or summary at top

DR 7

DR 6 Prev <— Open —> Next DR 8, or summary at top

Submitter: Jim Thomas
Submission Date: 2016-09-10
Source: WG14
Reference Document: N2077
Subject: Part 1: Editorial changes

Summary

In CFP email, Fred Tydeman noted:

Searching for "infinite precision" in part 1, most of them have "(as if) to" before it. Except, ffma, ffmal, dfmal which is missing the "(as if)".

Right. In particular, all the functions that round result to narrower type have “(as if)”, except for the fma family.

Suggested Technical Corrigendum

In 14.5, in the new text for C 7.12.13a.5#2, insert “(as if)” before “to infinite precision”.

Oct 2016 meeting

Committee Discussion

The committee agrees that this is a defect and accepts the Suggested Technical Corrigendum

Proposed Technical Corrigendum

In 14.5, in the new text for C 7.12.13a.5#2, insert “(as if)” before “to infinite precision”.

DR 6 Prev <— Open —> Next DR 8, or summary at top

DR 8

DR 7 Prev <— Open —> Next DR 9, or summary at top

Submitter: Jim Thomas
Submission Date: 2016-09-10
Source: WG14
Reference Document: N2077
Subject: Part 2: Editorial clarification about number digits in the coefficient

Summary

In 12.5, n is defined to be “the number of digits in the coefficient c”, where the decimal floating-point argument is represented by the triple (s, c, q). The intention is that n is the number of digits in the coefficient of the particular argument, i.e., the number of significant digits, not the maximum number of digits in the coefficient for the type. This might be misread, particularly since 5.2.4.2.2a says

¾ number of digits in the coefficient

DEC32_MANT_DIG 7

DEC64_MANT_DIG 16

DEC128_MANT_DIG 34

This part of 5.2.4.2.2a is in the context of characterizing the type, so clearly refers to the type and not any particular representation.

Suggested Technical Corrigendum

In 12.5, change:

where n is the number of digits in the coefficient c

to:

where n is the number of significant digits in the coefficient c

Oct 2016 meeting

Committee Discussion

The committee agrees that this is a defect and accepts the Suggested Technical Corrigendum

Proposed Technical Corrigendum

In 12.5, change:

where n is the number of digits in the coefficient c

to:

where n is the number of significant digits in the coefficient c

DR 7 Prev <— Open —> Next DR 9, or summary at top

DR 9

DR 8 Prev <— Open —> Next DR 10, or summary at top

Submitter: Jim Thomas
Submission Date: 2016-09-10
Source: WG14
Reference Document: N2077
Subject: Part 3: Missing specification for usual arithmetic conversions, tgmath

Summary

This is about the issue raised by Joseph Myers in email SC22WG14.14282:

C11 specifies that the usual arithmetic conversions on the pair of types (long double, double) produces a result of type long double.

Suppose long double and double have the same set of values. TS 18661-3 rewrites the rules for usual arithmetic conversions so that the case "if both operands are floating types and the sets of values of their corresponding real types are equivalent" prefers interchange types to standard types to extended types. But this leaves the case of (long double, double) unspecified as to which type is chosen, unlike in C11, as those are both standard types.

I think this is a defect in TS 18661-3, and it should say that if both are standard types with the same set of values then long double is preferred to double which is preferred to float, as in C11.

A similar issue could arise if two of the extended types have equivalent sets of values. I'm not aware of anything to prohibit that, although it seems less likely in practice. I think the natural fix would be to say that _Float128x is preferred to _Float64x which is preferred to _Float32x.

I think such an issue would also arise for <tgmath.h> (if _Float64x and _Float128x have the same set of values, the choice doesn't seem to be specified). It also seems possible for the <tgmath.h> rules for purely floating-point arguments to produce a different result from the usual arithmetic conversions (consider the case where _Float32x is wider than long double, and <tgmath.h> chooses long double), and since rules that are the same in most cases but subtly different in obscure cases tend to be confusing, I wonder if it might be better to specify much simpler rules for <tgmath.h>: take the type resulting from the usual arithmetic conversions[*], where integer arguments are replaced by _Decimal64 if there are any decimal arguments and double otherwise. (That's different from the present rules for e.g. (_Float32x, int), but it's a lot simpler, and seems unlikely in practice to choose a type with a different set of values from the present choice.)

[*] Meaningful for more than two arguments as long as the usual arithmetic conversions are commutative and associative as an operation on pairs of types.

Though substantive, the suggested change to the usual arithmetic conversions is consistent with the intention in TS 18661-3 to specify all the cases (except where neither format is a subset of the other and the formats are not the same). The missing cases were an oversight. The suggested preferences of long double over double over float and _Float128x over _Float64x over _Float32x are the obvious choices.

Joseph Myers notes that the <tgmath.h> specification is incomplete in the same way as the usual arithmetic conversions. He argues for simplifying the specification by referring to the usual arithmetic conversions specification, rather than mostly repeating it, as the current specification does. The suggested Technical Corrigendum below follows this new approach. Though a substantive change to TS 18661-3, the effects on implementations and users are expected to be minimal – worth the simplification.

The suggested Technical Corrigendum below also restores footnote number 62, which is lost in the current TS 18661-3.

Suggested Technical Corrigendum

In clause 8, change the replacement text for 6.3.1.8#1:

If one operand has decimal floating type, the other operand shall not have standard floating type, binary floating type, complex type, or imaginary type.

If both operands have floating types and neither of the sets of values of their corresponding real types is a subset of (or equivalent to) the other, the behavior is undefined.

Otherwise, if both operands are floating types and the sets of values of their corresponding real types are equivalent, then the following rules are applied:

If both operands have the same corresponding real type, no further conversion is needed.

Otherwise, if the corresponding real type of either operand is an interchange floating type, the other operand is converted, without change of type domain, to a type whose corresponding real type is that same interchange floating type.

Otherwise, if the corresponding real type of either operand is a standard floating type, the other operand is converted, without change of type domain, to a type whose corresponding real type is that same standard floating type.

Otherwise, if both operands have floating types, the operand, whose set of values of its corresponding real type is a (proper) subset of the set of values of the corresponding real type of the other operand, is converted, without change of type domain, to a type with the corresponding real type of that other operand.

Otherwise, if one operand has a floating type, the other operand is converted to the corresponding real type of the operand of floating type.

Otherwise, the integer promotions are performed on both operands. Then the following rules are applied to the promoted operands:

. . .

to:

If one operand has decimal floating type, the other operand shall not have standard floating type, binary floating type, complex type, or imaginary type.

If both operands have floating types and neither of the sets of values of their corresponding real types is a subset of (or equivalent to) the other, the behavior is undefined.

If both operands have the same corresponding real type, no further conversion is needed.

Otherwise, if both operands are floating types and the sets of values of their corresponding real types are equivalent, then the following rules are applied:

If the corresponding real type of either operand is an interchange floating type, the other operand is converted, without change of type domain, to a type whose corresponding real type is that same interchange floating type.

Otherwise, if the corresponding real type of either operand is long double, the other operand is converted, without change of type domain, to a type whose corresponding real type is long double.

Otherwise, if the corresponding real type of either operand is double, the other operand is converted, without change of type domain, to a type whose corresponding real type is double.

(All cases where float might have the same format as another type are covered above.)

Otherwise, if the corresponding real type of either operand is _Float128x or _Decimal128x, the other operand is converted, without change of type domain, to a type whose corresponding real type is _Float128x or _Decimal128x, respectively.

Otherwise, if the corresponding real type of either operand is _Float64x or _Decimal64x, the other operand is converted, without change of type domain, to a type whose corresponding real type is _Float64x or _Decimal64x, respectively.

Otherwise, if both operands have floating types, the operand, whose set of values of its corresponding real type is a (proper) subset of the set of values of the corresponding real type of the other operand, is converted, without change of type domain62), to a type with the corresponding real type of that other operand.

Otherwise, if one operand has a floating type, the other operand is converted to the corresponding real type of the operand of floating type.

Otherwise, the integer promotions are performed on both operands. Then the following rules are applied to the promoted operands:

. . .

In clause 15, replace:

In 7.25#3c, replace the bullets:

… bullets …

with:

— If two arguments have floating types and neither of the sets of values of their corresponding real types is a subset of (or equivalent to) the other, the behavior is undefined.

— If any arguments for generic parameters have type _DecimalM where M ≥ 64 or _DecimalNx where N ≥ 32, the type determined is the widest of the types of these arguments. If _DecimalM and _DecimalNx are both widest types (with equivalent sets of values) of these arguments, the type determined is _DecimalM.

— Otherwise, if any argument for generic parameters is of integer type and another argument for generic parameters has type _Decimal32, the type determined is _Decimal64.

— Otherwise, if any argument for generic parameters has type _Decimal32, the type determined is _Decimal32.

— Otherwise, if the corresponding real type of any argument for generic parameters has type long double, _FloatM where M ≥ 128, or _FloatNx where N ≥ 64, the type determined is the widest of the corresponding real types of these arguments. If _FloatM and either long double or _FloatNx are both widest corresponding real types (with equivalent sets of values) of these arguments, the type determined is _FloatM. Otherwise, if long double and _FloatNx are both widest corresponding real types (with equivalent sets of values) of these arguments, the type determined is long double.

— Otherwise, if the corresponding real type of any argument for generic parameters has type double, _Float64, or _Float32x, the type determined is the widest of the corresponding real types of these arguments. If _Float64 and either double or _Float32x are both widest corresponding real types (with equivalent sets of values) of these arguments, the type determined is _Float64. Otherwise, if double and _Float32x are both widest corresponding real types (with equivalent sets of values) of these arguments, the type determined is double.

— Otherwise, if any argument for generic parameters is of integer type, the type determined is double.

— Otherwise, if the corresponding real type of any argument for generic parameters has type _Float32, the type determined is _Float32.

— Otherwise, the type determined is float.

In the second bullet 7.25#3c, attach a footnote to the wording:

the type determined is the widest

where the footnote is:

*) The term widest here refers to a type whose set of values is a superset of (or equivalent to) the sets of values of the other types.

with:

In 7.25#3c, replace the first sentence and bullets:

[3c] Except for the macros for functions that round result to a narrower type (7.12.13a), use of a type-generic macro invokes a function whose generic parameters have the corresponding real type determined by the corresponding real types of the arguments as follows:

— First, if any argument for generic parameters has type _Decimal128, the type determined is _Decimal128.

— Otherwise, if any argument for generic parameters has type _Decimal64, or if any argument for generic parameters is of integer type and another argument for generic parameters has type _Decimal32, the type determined is _Decimal64.

— Otherwise, if any argument for generic parameters has type _Decimal32, the type determined is _Decimal32.

— Otherwise, if the corresponding real type of any argument for generic parameters is long double, the type determined is long double.

— Otherwise, if the corresponding real type of any argument for generic parameters is double or is of integer type, the type determined is double.

— Otherwise, if any argument for generic parameters is of integer type, the type determined is double.

— Otherwise, the type determined is float.

with:

[3c] Except for the macros for functions that round result to a narrower type (7.12.13a), use of a type-generic macro invokes a function whose generic parameters have the corresponding real type determined by the types of the arguments for the generic parameters as follows:

— Arguments of integer type are regarded as having type _Decimal64 if any argument has decimal floating type, and as having type double otherwise.

— If the function has exactly one generic parameter, the type determined is the corresponding real type of the argument for the generic parameter.

— If the function has exactly two generic parameters, the type determined is the corresponding real type determined by the usual arithmetic conversions (6.3.1.8) applied to the arguments for the generic parameters.

— If the function has more than two generic parameters, the type determined is the corresponding real type determined by repeatedly applying the usual arithmetic conversions, first to the first two arguments for generic parameters, then to that result type and the next argument for a generic parameter, and so forth until the usual arithmetic conversions have been applied to the last argument for a generic parameter.