This paper proposes an alternate approach to dealing with these issues: just undo the transformations done in phase 1 and 2 inside a raw string. Apparently many compilers already keep track of those transformations internally.
This paper incorporates the proposed wording for issue 872, and addresses UK comment 11 (core issue 789).
3. The source file is decomposed into preprocessing tokens (2.5) and sequences of white-space characters (including comments). A source file shall not end in a partial preprocessing token or in a partial comment.12 Each comment is replaced by one space character. New-line characters are retained. Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is unspecified. The process of dividing a source file's characters into preprocessing tokens is context-dependent. [ Example: see the handling of < within a #include preprocessing directive. -- end example ]
2.5 [lex.pptoken] paragraph 3:
If the input stream has been parsed into preprocessing tokens up to a given character2.14.5 [lex.string]:
- , the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token, even if that would cause further lexical analysis to fail.
raw-string: " d-char-sequenceopt...
[r-char-sequenceopt ]d-char-sequenceopt " r-char-sequence: r-char r-char-sequence r-char r-char: any member of the source character set, except (1), a backslash \followed by a u or U, or (2),a right square bracket ]followed by the initial d-char-sequence (which may be empty) followed by a double quote ". universal-character-named-char-sequence: d-char d-char-sequence d-char d-char: any member of the basic source character set except: space, the left square bracket [, the right square bracket ], and the control characters representing horizontal tab, vertical tab, form feed, and newline.
A string literal is a sequence of characters (as defined in 2.14.3) surrounded by double quotes, optionally prefixed by R, u8, u8R, u, uR, U, UR, L, or LR, as in "...", R"
[... ]", u8"...", u8R"** [... ]**", u"...", uR"*~ [... ]*~", U"...", UR"zzz [... ]zzz", L"...", or LR" [... ]", respectively.
A string literal that has an R in the prefix is a raw string literal. The d-char-sequence serves as a delimiter. The terminating d-char-sequence of a raw-string is the same sequence of characters as the initial d-char-sequence. A d-char-sequence shall consist of at most 16 characters.
[ Note: The characters '
[' and ' ]' are permitted in a raw-string. Thus, R"delimiter [[a-z]]delimiter" is equivalent to " [a-z]". -- end note ]
[ Note: A source-file new-line in a raw string literal results in a new-line in the resulting execution string-literal
, unless preceded by a backslash. Assuming no whitespace at the beginning of lines in the following example, the assert will succeed:const char *p = R"-- end note ] [a\ b c ]"; assert(std::strcmp(p, "ab\nc") == 0);
Escape sequences in non-raw string literals
and universal-character-names in string literalshave the same meaning as in character literals ....