N3342=12-0032
Jens Maurer
2012-01-09

Digit Separators coming back

Introduction

This paper proposes syntax extensions to C++ in order to be able to write large numeric literals with separators between the digits to make them more readable.

This paper is largely based on N2281 = 07-0141 "Digit Separators" by Lawrence Crowl. The proposed wording changes have been updated for C++11 (more specifically, the latest working draft N3290).

This paper does not propose to add binary literals or hexadecimal floating-point literals; those are considered largely independent of this paper and thus can be addressed separately.

Motivation

For most people, reading large numbers without additional (redundant) visual cues is hard. Examples:

Adding additional visual cues help, for example spaces: An alternative visual cue might be to use underscores, elsewhere often employed to form identifiers with a space-lookalike character (but without violating identifier syntax):

Discussion

Using a space character would cause a literal potentially to become two or more preprocessing-tokens, with rather substantial impact not only on the lexing phase, but also on the parsing phase of C++. Therefore, this paper proposes to use the underscore variant.

Using underscores conflicts with user-defined literals. Appropriate disambiguation is already provided for in the current wording, see 2.14.8 lex.ext paragraph 1, but the example can be improved for the new situation. In effect, that means a user-defined literal may not start with underscore-digit. Given that user-defined literals are already severely constrained (see 2.14.8 lex.ext and 17.6.4.3.5 userlit.suffix), this seems to be a mild inconvenience for the next revision of the standard.

Wording Changes

The grammar production pp-number in 2.10 lex.ppnumber already permits underscores inside (via identifier-nondigit and nondigit). No changes are necessary.

Change in 2.14.2 lex.icon:

decimal-literal:
       nonzero-digit
       decimal-literal underscoreopt digit

octal-literal:
       0
       octal-literal underscoreopt octal-digit

hexadecimal-literal:
      0x hexadecimal-digit
      0X hexadecimal-digit
      hexadecimal-literal underscoreopt hexadecimal-digit

underscore: _
Change in 2.14.2 lex.icon paragraph 1:
An integer literal is a sequence of digits that has no period or exponent part, with optional separating underscores that are ignored when determining its value. ... [ Example: the number twelve can be written 12, 1_2, 014, 01_4, or 0XC. -- end example ]

Change in 2.14.4 lex.fcon:

digit-sequence:
       digit
       digit-sequence underscoreopt digit
Change in 2.14.4 lex.fcon paragraph 1:
... The integer and fraction parts both consist of a sequence of decimal (base ten) digits, with optional separating underscores that are ignored when determining the value. ...
Change in 2.14.8 lex.ext paragraph 1:
If a token matches both user-defined-literal and another literal kind, it is treated as the latter. [ Example: 123_km is a user-defined-literal, but 123_456 and 12LL are integer-literals 12LL is an integer-literal. -- end example ] ...