Document: WG14 N1317


New macros for <float.h>


Submitter: Fred Tydeman (USA)
Submission Date: 2008-07-13
Previous version of paper: N1303
Related WG14 documents: N1151, N1171
Subject: New macros for <float.h>

Existing practice: Many implementation have macros (with various spellings) for the minimum subnormal numbers. C99 has DECIMAL_DIG with the similar meaning as LDBL_MAXDIG10.

The committee rejected the idea of having the xxx_SUBNORMAL_MIN macros be conditionally defined, e.g., only if subnormals are supported.

The committee asked that they be changed to the smallest positive number (either smallest subnormal or smallest normal). In doing the asked for change, the author changed the name of the macros to a more meaningful name.

A problem with having the macros always defined is there is no way, at translation time, to determine if the xxx_TRUE_MIN is subnormal or normal. This is due to #if (xxx_MIN != xxx_TRUE_MIN) being a constraint violation (may only use integer values then). So, the test of subnormal support must happen at runtime (which is a performance penalty).

One solution to this problem is a macro, made up of sum of powers of 2, where each power of 2 represents a different floating type. Something like:

[new paragraph before paragraph 9]: The set of floating types that support subnormal numbers is characterized by the implementation defined value of SUBNORMAL_FLOAT_TYPES (which is the sum of the values, which are powers of 2, given for each type):

  0x00    no floating types
  0x01    float
  0x02    double
  0x04    long double
  0x10    _Decimal32
  0x20    _Decimal64
  0x40    _Decimal128
Of course, we could create names for all those powers of two (similar to MATH_ERR* for math_errhandling in 7.12 <math.h>).

Another solution is to add: [new bullet after xxx_MIN_EXP]: minimum negative integer such that FLT_RADIX raised to one less than that power is a floating-point number,

FLT_TRUE_MIN_EXP
DBL_TRUE_MIN_EXP
LDBL_TRUE_MIN_EXP
With this solution, do we also need the base 10 exponent macros?

Changes to C1x

Add new bullets to 5.2.4.2.2 Characteristics of floating types <float.h>

[bullet near DECIMAL_DIG] The number of base 10 digits required to ensure that floating-point numbers with /p/ radix /b/ digits which differ by only one unit in the last place (ulp) are always differentiated,

 p log10 b             if b is power of 10
 ceil(1 + p log10 b)   otherwise 

[Note to editor: WG14 paper N1290 on printed page 9 has the correct symbols/fonts for the above two math expressions; it is also very similar to the existing math expressions for DECIMAL_DIG in C99.]

FLT_MAXDIG10    6
DBL_MAXDIG10   10
LDBL_MAXDIG10  10

[bullet after FLT_MIN] minimum positive floating-point number. If subnormal numbers are supported [footnote], their value is the minimum subnormal (also known as denormal) floating-point number, otherwise the minimum normal floating-point number, of the respective types.

FLT_TRUE_MIN        1E-37
DBL_TRUE_MIN        1E-37
LDBL_TRUE_MIN       1E-37

[footnote]: Support means that they are not flushed to zero when used as operands, nor, when an arithmetic operation produces them.

[paragraph 13, example 1] Add

FLT_MAXDIG10   9
after FLT_RADIX

[paragraph 14, example 2] Remove "normalized" from just before IEC 60559.

Add

FLT_MAXDIG10    6
DBL_MAXDIG10   17
after DECIMAL_DIG

Add

FLT_TRUE_MIN        1.40129846E-45 // decimal constant
FLT_TRUE_MIN        0X1P-149F // hex constant
DBL_TRUE_MIN        4.9406564584124654E-324 // decimal constant
DBL_TRUE_MIN        0X1P-1074 // hex constant
after FLT_MIN and DBL_MIN.

Words for Rationale:

[add to 5.2.4.2.2 section] For applications that need to check, at translation time, if subnormal floating-point numbers are supported can do: {depends upon approach we settle on}

The values of the smallest subnormal floating-point numbers (if supported) are typically, but not always, FLT_MIN*FLT_EPSILON, DBL_MIN*DBL_EPSILON, LDBL_MIN*LDBL_EPSILON.