Name n3758, alx-0068r1 - C source files are text files Principles - Keep the language small and simple. - Codify existing practice to address evident deficiencies. - Follow international standards Category Attributes Author Reported-by: Martin Uecker Suggested-by: Alejandro Colomar Reported-by: Joseph Myers Suggested-by: Marcus Johnson Suggested-by: Aaron Peter Bachmann Suggested-by: Christopher Bazley Cc: Martin Uecker Cc: John McCall Signed-off-by: Alejandro Colomar History r0 (2025-09-19): - Initial draft. r1 (2025-12-02; n3758): - Remove line. Description One could think that this paper acts in bad faith, by trying to revise a very recent decision by strong consensus of WG14. Oh, hilarity! No, wake me from this bad dream! Just kidding. Let's go straight to this proposal. So, WG14 decided, by strong consensus, to remove some weird undefined behavior, just for the sake of saying they're removed one undefined behavior. Did WG14 consider the consequences of it? In retrospective, it seems not. It is an exercise to the reader to judge whether removing undefined behavior without enough analysis is a good or a bad thing, and how the committee should make sure this doesn't happen often in the future. If only anyone had warned that POSIX already requires text files as input to most tools for good reasons. And that defining the behaviour could cause second-order bugs, even if just for mistakes in implementations, which would need to be unnecessarily more complex by having to deal with non-text input. Too bad that nobody raised such concerns, or did they? But to rectify is for the wise. The committee, or part of it, seems to have realized the mistake, and that a constraint violation would have been a better choice. Let's forgive those who rectify their own mistakes. Since this is mostly a theoretical UB, and no hard drives were damaged due to it, let's constrain it. (Low) quality implementations are free to define their behavior after the mandatory diagnostic, and are even free to decide to hide the diagnostic under a -Wpedantic flag that turns on the conforming diagnostic. So, this is not even a problem for users that need the behavior defined, if they really exist. They'll come to their vendors in the black market of extensions, and make an appropriate deal with them. Let's fix the standard, and keep it simple. While at it, fix the case where a source file is an empty file (0 bytes). That should also be a constraint violation, as it does not end in a new line, and thus is not a text file. Prior art POSIX requires that C source files are text files, which essentially means that they are terminated by a newline character. (It also means that they don't contain NUL characters, and that no line exceeds LINE_MAX, but let's ignore those details for this discussion.) This is not a requirement specific to C source files. POSIX requires text files as input to almost every command. (With the obvious exceptions of those commands that are not meant to handle text.) Let's follow POSIX, and require that C source files are text files. This includes disallowing the case of a source file with 0 bytes. Proposed wording Based on N3550. ## This will need reverting some proposal that was merged in ## the Brno meeting. Of course do so. Since the latest text we ## have is n3550, I've written this proposal considering an ## implicit revert of that. 5.2.1.2 Translation phases @@ p1 -A source file -that is not empty // <-- This is removed without replacement! -shall end in a new-line character, -which shall not be immediately preceded -by a backslash character -before any such splicing takes place. 6.4.1 Lexical elements :: General @@ Constraints, p2+1 +A source file +shall end in a new-line character, +which shall not be immediately preceded +by a backslash character +before splicing takes place as described in 5.2.1.2.