Chasing Ghosts I: constant expressions

Jens Gustedt, INRIA and ICube, France

2025-05-27

target

integration into IS ISO/IEC 9899:202y

document history

document number date comment
n3447 202501 Original proposal, partially accepted in Graz
n3558 202505 Elaborate the different changes
Add one new UB that flew under the radar

license

CC BY, see https://creativecommons.org/licenses/by/4.0

1 Motivation

The recent campaign for slaying daemons has revealed that in fact some of the undefined behavior (UB) in the current C standard doesn’t even exist: some of the situations in J.2 that would in principle result in UB cannot trigger at all. The reason for these are misformulations in the normative text that seem to indicate UB where in fact there only are constraint violations or unspecified behavior.

We say that a semantic non-constraint requirement is a ghost-UB if no conforming program in any execution may ever violate it.

The present paper deals with ghost-UB that is attributed to constant expression, namely J.2 (50) to (53) in the counting as in n3467 before Graz. The principal observation here is that the term “constant expression” is a syntax term and not a semantic term. So either an expression is a constant expression or it is isn’t and the difference between the two is not behavior (semantic) but syntax alone.

In fact, all uses in the standard of the term are either covered by constraint violations (such as for array designators in initializers) or by the fact that whether an expression is a constant expression (or not) distinguishes different categories (such as VLA and arrays of known constant size). In all of these cases, it makes no sense to speak of UB.

In fact, there are only two types of UB that remain in this clause. The first concerns a possible discrepancy between evaluation of floating point expressions in the translation and execution environment; this UB is now handled as J .2 (45) in n3550. The second is one that so far flew below the radar, namely the use of the member operator on union constants for members other than the one that is initialized.

Otherwise, the standard does not seem to intend to leave constant expressions as a UB extension point: 6.6.1 p14 explicitly states that the use of extensions to the concept of constant expressions is implementation-defined.

2 Approach

Once convinced that we have ghost-UB, the easiest way to deal with the situation is just to remove the useless listings (50) to (53) in J.2 of n3467. This has already been voted in Graz and in n3550 the items are already removed.

We think that this alone would not be very user friendly and that users still would trip over the many “shall” that are confusingly applied in the text.

In fact, most of the text in 6.6.1 is even placed in the wrong category. The definitions made there are purely descriptive and convey no semantics beyond that. Thus we propose to reorder most of the text that then would appear under “Description” and only leave those entries that must in “Constraints” and “Semantics”. These deal with cases where the evaluated value does not fit:

Interesting for the last two, the implied UB didn’t even make it into J.2’s list of n3467, so we also add it, there.

No normative change to the concept of “constant expression” is intended by this paper.

There is a subtle change, though, for extended constant expressions. The current text formulates a constraint for operators that may be used within a constant expression and thereby either

This paper here moves this possible use to syntax alone. Thereby it enables implementations in particular to extend constant expressions to

3 Suggested Wording

New text is underlined green, removed text is stroke-out red. Reorganization of the paragraphs is indicated in the running text, Xª refers to new paragraph numbers, pX to current numbers.

Existing footnotes are unchanged and their numbers refer to n3550.

3.1 Clause 6.6, Constant expressions

We propose to reorder this clause completely and to remove most “shall” by just factual description and add some explanations where this seems appropriate. This means that the following is a complete replacement of the corresponding section. Diff-marks are applied on a paragraph level, that is moved paragraphs without other changes occur without diff-marks.

6.6.1 General

Syntax

constant-expression: conditional-expression

Description

A constant expression can be evaluated during translation rather than runtime, and accordingly can be used in any place that a constant can be. The fact that a given conditional expression forms a constant expression is detected in translation phases 4XXX) and 7. In most of the cases, the value of the constant expression is determined in translation phases 1-7 (i.e. before linking). Values of address constants or arithmetic constant expressions with floating type and values that are derived from these are possibly only determined during linking (translation phase 8) or just before program startup.

Add the footnote

XXX) These are the integer constant expressions that are used for conditional inclusion (6.10.2) and binary resource inclusion (6.10.4).

Reuse a descriptive phrase from p5, complement it with more information and form a new paragraph.

An expression that evaluates to a constant is required in several contexts. The most general form appears in initializers for objects whose value is determined at translation time or program startup such as objects with static storage duration or with the constexpr specifier. Additionally, for some uses the property of an expression E being an integer constant expression or not changes the semantics of the program. In particular, it determines if an array declaration T A[E] forms a VLA or not and thus if a seemingly benign expression sizeof(A) is determined at translation time or evaluated each time when it is met during the program execution.

Reuse p3 from the constraints section.

Constant expressions shall do not contain assignment, increment, decrement, function-call, or comma operators, except when they are contained within a subexpression that is not evaluated.116) Additionally, such a constant expression is, or evaluates to, a null pointer constant (6.3.3.3) or one of the categories that are described in this clause:

Use the list from p7

Continue with p6

A compound literal with storage-class specifier constexpr is a compound literal constant, as is a postfix expression that applies the . member access operator to a compound literal constant of structure or union type, even recursively. A compound literal constant is a constant expression with the type and value of the unnamed object.

An identifier that is:

is a named constant, as is a postfix expression that applies the . member access operator to a named constant of structure or union type, even recursively. For enumeration and predefined constants, their value and type are defined in the respective clauses; for constexpr objects, such a named constant is a constant expression with the type and value of the declared object.

An integer constant expression118) shall have has integer type and shall only have has operands that are integer literals, named and compound literal constants of integer type, character literals, sizeof or _Countof expressions which are integer constant expressions, alignof expressions, and floating, named, or compound literal constants of arithmetic type that are the immediate operands of casts. Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to the typeof operators, sizeof operator, _Countof operator, or alignof operator.

Paragraph p9 has already been used above.

p9 More latitude is permitted for constant expressions in initializers. Such a constant expression shall be, or evaluate to, one of the following:

Continue with p10

An arithmetic constant expression shall have has arithmetic type and shall only have has operands that are floating literals, named or compound literal constants of arithmetic type and integer constant expressions. Cast operators in an arithmetic constant expression shall only convert arithmetic types to arithmetic types, except as part of an operand to the typeof operators, sizeof operator, _Countof operator, or alignof operator.

An address constant is a null pointer,119) a pointer to an lvalue designating an object of static storage duration, or a pointer to a function designator; it shall be is created explicitly using the unary & operator or an integer constant cast to pointer type, or implicitly using an expression of array or function type.

10ª The array-subscript [] and member-access -> operator, the address & and indirection * unary operators, and pointer casts can be used in the creation of an address constant, but the if no value of an object shall not be is accessed by use of these operators.120)

11ª A structure or union constant is a named constant or compound literal constant with structure or union type, respectively.

12ª An implementation may accept other forms of constant expressions, called extended constant expressions. It is implementation-defined whether extended constant expressions are usable in the same manner as the constant expressions defined in this document, including whether or not extended integer constant expressions are considered to be integer constant expressions.121)

The next paragraph is merely a repetition and probably confuses more than it helps:

p15 Starting from a structure or union constant, the member-access . operator can be used to form a named constant or compound literal constant as described previously in this subclause.

Move p4 here and start the constraints section

Constraints

13ª Each constant expression shall evaluate to a constant that is in the range of representable values for its type.

Move p5 here and start the semantics section

Semantics

14ª An expression that evaluates to a constant is required in several contexts. If a floating expression is evaluated in the translation environment, the arithmetic range and precision shall be at least as great as if the expression were being evaluated in the execution environment.117)

Continue with p16

15ª If the member-access operator . accesses a member of a union constant, the accessed member shall be the same as the member that is initialized by the union constant’s initializer.

16ª The Otherwise, the semantic rules for the evaluation of a constant expression are the same as for nonconstant expressions.122)

Forward references: array declarators (6.7.7.3), initialization (6.7.11).

3.2 J.2, Undefined behavior

Remove the following four entries (50)-(53) (counting as in n3467 before Graz, already done in the Graz meeting)

(50​) An expression that is required to be an integer constant expression does not have an integer type; has operands that are not integer literals, named constants, compound literal constants, enumeration constants, character literals, predefined constants, sizeof or _Lengthof expressions whose results are integer constant expressions, alignof expressions, or immediately-cast floating literals; or contains casts (outside operands to sizeof, _Lengthof and alignof operators) other than conversions of arithmetic types to integer types (6.6).
(51​) A constant expression in an initializer is not, or does not evaluate to, one of the following: a named constant, a compound literal constant, an arithmetic constant expression, a null pointer constant, an address constant, or an address constant for a complete object type plus or minus an integer constant expression (6.6).
(52​) An arithmetic constant expression does not have arithmetic type; has operands that are not integer literals, floating literals, named and compound literal constants of arithmetic type, character literals, predefined constants, sizeof or _Lengthof expressions whose results are integer constant expressions, or alignof expressions; or contains casts (outside operands to sizeof or alignof operators) other than conversions of arithmetic types to arithmetic types (6.6).
(53​) The value of an object is accessed by an array-subscript [], member-access . or ->, address &, or indirection * operator or a pointer cast in creating an address constant (6.6).

Add one new entry (counting as in n3550, already done in the Graz meeting)

(45​) The value of a floating expression as determined in the translation environment in the context of the evaluation of a constant expression is outside the arithmetic range or has less precision than if it were evaluated in the execution environment (6.6.1).

Add one new entry (new with this paper)

(45′) The member-access operator . accesses a member of a union constant and the accessed member is not the same member that is initialized by the union constant’s initializer. (6.6.1).

4 Note to the editors and other interested parties

There is a branch on WG14’s gitlab that reflects the proposed changes:

https://gitlab.gwdg.de/iso-c/draft/-/tree/CE

Acknowledgments

Thanks to Martin Uecker, Javier Múgica, Joseph Myers and Chris Bazley for review and discussions.