Add @, $, and ` to the basic character set

Document #: D2558R0
Date: 2022-03-11
Project: Programming Language C++
Audience: SG16
SG22
EWG
Reply-to: Steve Downey
<>

1 Abstract

WG14, the C Standardization committee, is adopting [N2701] for C23. This will add U+0024 $ DOLLAR SIGN, U+0040 @ COMMERCIAL AT, and U+0060 ` GRAVE ACCENT to the basic source character set. C++ should adopt the same characters for C++26.

2 Motivation

These characters are available in all encoded character sets in common use and everyone assumes that they are available, using them freely in source text. The primary change would be that these characters become available for syntactic purposes. Although using $ in identifiers is a common extension, they were not added to the identifier set in C, and this paper does not propose adding them either. Nor were trigraphs added in C for these characters, and this paper does not propose additional trigraphs or digraphs be added.

The translation model for C makes adding these to their basic source character set, the encoded set for source code before translation, much more compelling. These characters being already in the translation character set as single byte characters makes this less important for C++. Nonetheless, it would be useful to make these available for language purposes as the more conservative C language has agreed there are no functional impediments to their use.

Corentin Jabot discusses the usage in other programming languages extensively in [P2342R0], For a Few Punctuators More, q.v.

3 Wording

These changes are relative to [N4901] “Working Draft, Standard for Programming Language C++”

Modify [lex.charset] as follows:

2 The basic character set is a subset of the translation character set, consisting of 9699 characters as specified in Table 1.

Modify [tab:lex.charset.basic] with the following additions:

U+0009               CHARACTER TABULATION
U+000B               LINE TABULATION
U+000C               FORM FEED
U+0020               SPACE
U+000A               LINE FEED                    new-line
U+0021               EXCLAMATION MARK             !
U+0022               QUOTATION MARK               "
U+0023               NUMBER SIGN                  #
U+0024               DOLLAR SIGN                  $
U+0025               PERCENT SIGN                 %
U+0026               AMPERSAND                    &
U+0027               APOSTROPHE                   '
U+0028               LEFT PARENTHESIS             (
U+0029               RIGHT PARENTHESIS            )
U+002A               ASTERISK                     *
U+002B               PLUS SIGN                    +
U+002C               COMMA                        ,
U+002D               HYPHEN-MINUS                 -
U+002E               FULL STOP                    .
U+002F                SOLIDUS                     /
U+0030 .. U+0039     DIGIT ZERO .. NINE           0 1 2 3 4 5 6 7 8 9
U+003A               COLON  :
U+003B               SEMICOLON                    ;
U+003C               LESS-THAN SIGN               <
U+003D               EQUALS SIGN                  =
U+003E               GREATER-THAN SIGN            >
U+003F               QUESTION MARK                ?
U+0040               COMMERCIAL AT                @
U+0041 .. U+005A     LATIN CAPITAL LETTER A .. Z  A B C D E F G H I J K L M
                                                  N O P Q R S T U V W X Y Z
U+005B               LEFT SQUARE BRACKET          [
U+005C               REVERSE SOLIDUS              \
U+005D               RIGHT SQUARE BRACKET         ]
U+005E               CIRCUMFLEX ACCENT            ^
U+005F               LOW LINE                     _
U+0060               GRAVE ACCENT                 `
U+0061 .. U+007A     LATIN SMALL LETTER A .. Z    a b c d e f g h i j k l m
                                                  n o p q r s t u v w x y z
U+007B               LEFT CURLY BRACKET           {
U+007C               VERTICAL LINE                |
U+007D               RIGHT CURLY BRACKET          }
U+007E               TILDE                        ~

4 References

[N2701] Philipp Klaus Krause. @ and $ in source and execution character set.
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2701.htm

[N4901] Thomas Köppe. 2021-10-22. Working Draft, Standard for Programming Language C++.
https://wg21.link/n4901

[P2342R0] Corentin Jabot. 2021-03-25. For a Few Punctuators More.
https://wg21.link/p2342r0