Introduce the nullptr constant

Jens Gustedt (INRIA France)

JeanHeyd Meneide (https://thephd.dev)

2022-04-23

org:	ISO/IEC JCT1/SC22/WG14	document:	N2978
	… WG21 C and C++ liaison		P2312
target:	IS 9899:2023	version:	4
date:	2022-04-23	license:	CC BY

Abstract

Since more than a decade C++ has already replaced the problematic definition of NULL which might be either of integer type or void*. By using a new constant nullptr, they achieve a more constrained specification, that allows much better diagnosis of user code. We propose to integrate this concept into C as far as possible by imposing only minimal ABI additions.

Summary of Changes

v4/R2: Build back for simplicity
- have nullptr_t as a complete object type that has the same representation as void* and char* but only one value, nullptr
- for “Boolean” use cases of nullptr_t only explicitly formulate conversion to bool and test for equality
- add nullptr_t to all places that do “Boolean” evaluation, they are currently formulated as comparison to 0
- add nullptr_t to the set of possible argument types of ... lists, and make them compatible with void* and char* interpretation by va_arg.
v3/R1: integrating feedback from different sources
- make the type of nullptr incomplete and incompletable
- move most of the type information to nullptr itself and insist that it has as type that is different from any other standard type or type that could be defined by users code
- since nullptr does not have a scalar type, add it explicitly to contexts such as or similar that so far only had scalars
- change the adjustment rules to result in int of value 0 and 1 for contexts where logical evaluation still has that type
- insist that the first operand of a ternary or comma expression is evaluated
- insist that primary expressions such as () or _Generic also are constant expressions or null pointer constants if the respective operands are
- add nullptr to generic selection
- don’t allow nullptr as the last parameter before a ...
- only allow nullptr parameters without names
v2/R0: a complete rewrite as a proper language feature instead of a shallow macro solution

Introduction

The macro NULL, that goes back quite early, was meant to provide a tool to specify a null pointer constant such that it is easily visible and such that it makes the intention of the programmer to specifier a pointer value clear. Unfortunately, the definition as it is given in the standard misses that goal, because the constant that is hidden behind the macro can be of very different nature.

A null pointer constant can be any integer constant of value 0 or such a constant converted to void*. Thereby several types are possible for NULL. Commonly used are 0 with int, 0L with long and (void*)0 with void*.

This may lead to surprises when invoking a type-generic macro with an NULL argument.
Conditional expressions such as (true ? 0 : NULL) and (true ? 1 : NULL) have different status depending how NULL is defined. Whereas the first is always defined, the second is a constraint violation if NULL has type void*, and defined otherwise. In particular, the second happens to work in C++ but most of the times not in C.
A NULL argument that is passed as a sentinel to a ... function that expects a pointer can have severe consequences. On many architectures nowadays int and void* have different size, and so if NULL is just 0, a wrongly sized arguments is passed to the function.
In particular, C++ can’t have NULL as (void*)0 because void* does not implicitly convert to other pointer types. Thus it is usually an integer constant of value zero. On the C side (e.g by printf) such a passed integer constant is then interpreted as void* or char*; such a re-interpretation has undefined behavior.

Rationale

Why do we need a specific `nullptr` constant?

Null pointer constants in C are a feature that is somewhat defined orthogonal to the type system. They are based on the concept of “integer constant expressions” and may in fact have any integer type (even bool, enumerations, character constants or expressions such as x-x are possible) as long as the value can be determined at translation time and happens to be zero. On top of that ambiguity concerning integer types, it is even permitted to use an explicit cast to void* and to still obtain an integer constant expression.

The standard macro NULL inherits from these confusing definitions and has no standardized type and no standardized behavior in contexts that are different from simple conversion to a pointer type. For example a use of NULL as an argument to a ... function is not guaranteed to work.

If NULL has integer type but different alignment or size than void* any access with va_arg that interprets such an argument could crash the program.
If NULL has integer type and null pointers are not represented as all-bit zero, such a transfered integer cannot be reinterpreated as a pointer value that would be a null pointer.
If NULL has integer type (and not void*) and if even the integer type, say long, has the correct size and alignment, an interpretation of that past-in integer in the form
```
char* a = va_arg(ap, char*);
```
has undefined behavior. As an exception va_arg allows the reinterpreation between void* and char*, for example, but not from integer type to pointer type.

Also, it is not easy to detect if an argument to a function or even macro is a null pointer constant or only an arbitrary null pointer value. In C, compile time code distinction is usually done in the preprocessor or by _Generic. The preprocessor doesn’t work with NULL because it might not even be a preprocessor constant. _Generic is difficult to use because it is based on types and not values, although there are ways to abuse properties of conditional expressions, integer constant expressions, null pointer constants and _Generic to do so.

Another reason to strengthen the definition of null pointer constants in C is the common confusion between a null pointer and a pointer that points to the zero address in the OS, as is suggested by using integer literals such as 0 to express null pointer constants. Also, the fact that on some architectures a null pointer is not necessarily represented with a all-zero bit-pattern always needs special attention when teaching C and is quite surprising for beginners. If it were that these sophistic distinctions would be necessary for the expressivity of the language, that could perhaps be acceptable, but here it clearly is a random burden that is imposed on generations of teachers and students that is only rooted in history and has no raison d’être as of today; all other programming languages that have concepts similar to pointers in C do quite well without this ambiguity between numbers and pointers.

The idea of nullptr is to end this ambiguity and to provide a keyword with a value and a portable type that can be used anywhere where a null pointer constant is needed.

The nullptr feature presented in this paper has the following properties.

It has a complete object type.
It does not have scalar type, so it is forbidden in arithmetic.
It converts to any pointer type.
It converts to bool by always evaluating to false.
In memory, nullptr is represented with the same bit-pattern as a null pointer constant of type void*.
nullptr is permitted in all “Boolean” contexts such as && operators or if statements.
nullptr is permitted as argument to ..., as long as the function interprets it as pointer to void or character type.

The aim is that this feature has exactly the same behavior as the corresponding feature in C++.

Why do we need a specific `nullptr_t` type different from `void*`?

The secondary feature proposed in this paper is the the type nullptr_t with the intent to allow better diagnostics for functions that possibly receive a null pointer argument and to potentially optimize the case where a null pointer constant is received.

Consider a function func that receives a pointer parameter that can either be valid or a null pointer to indicate a default choice.

// header "func.h"
void func_general(toto*);

// define a default action
// no parameter name, parameter is never read
inline void func_default(nullptr_t) {
  ...
}

#define func(P)                     \
   _Generic((P),                    \
         nullptr_t: func_default,   \
         default:   func_general)(P)

// one translation unit
#include "func.h"
// emit an external definition
extern void func_default(nullptr_t);

// define the general action
void func_general(toto* p) {
  // p may still have value null
  if (!p) func_default(nullptr);    // may only be called with nullptr
  else {
    ...
  }
}

Here, a function func_default is defined that receives a nullptr. The function needs no access to the parameter, since that parameter can only hold one specific value. A type-generic macro func then chooses this function or the general function func_general. The translation unit that defines func_general may then emit an external definition of func_default and also use it within the definition for the case that func_general receives a parameter value that is null without being recognized as such at translation time of the call.

#include "func.h"
...
   func(0);        // ok, but uses the general function and may issue a diagnostic
   func((void*)0); // ok, but uses the general function, no diagnostic
   func(NULL);     // ok, but uses the general function, diagnostic or not
   func((toto*)0); // ok, but uses the general function, no diagnostic
   func(nullptr);  // uses default action directly

The use of the macro with a null pointer constant of integer type then uses the general function and sets the parameter to null; implementations that chose to diagnose the use of null pointer constants of integer type may do so for this call.

In contrast to that, a call that uses nullptr as an argument directly resolves to func_default, may or may not inline the corresponding action, and will not trigger such a diagnosis.

The emission of a diagnosis can be forced by restricting the admissible type as shown in the definition of func_strict.

#define func_strict(P)              \
   _Generic((P),                    \
         nullptr_t: func_default,   \
         toto*:     func_general)(P)
...
   func_strict(0);        // invalid, int argument is not a valid choice, constraint violation
   func_strict((void*)0); // invalid, void* argument is not a valid choice, constraint violation
   func_strict(NULL);     // invalid, void* or integer argument is not a valid choice, constraint violation
   func_strict((toto*)0); // ok, but uses the general function, no diagnostic
   func_strict(nullptr);  // uses default action directly

Design choices

After WG14 refused a specification for a simple macro with value (void*)0, as well as a sophisticated version with an incomplete type and with a rewriting approach for many contexts, this new version tries a middle ground.

A null pointer constant of its own right

The principal property of nullptr is that it is a null pointer constant. But it is one of its own right, not deduced from a property of any other feature. From the existing text it then basically follows that it can be used everywhere where a pointer is to be initialized or assigned to a null pointer value.

It has a type that is different from all other null pointer constants, in particular the type is neither an integer nor a pointer type. So in any context where type plays a role, it cannot be confused with an expression with a type of any of these.

A complete object type with fixed representation

The type of nullptr is a complete object type that is neither an array nor a scalar type and has exactly one value, namely nullptr. For C, this directly disallows the use of the type (and thus nullptr) in most other expressions, in particular in arithmetic.

Because we want to be able to use this type also for parameters, as members of unions (for type punning), and as argument to ... functions, we have to prescribe a representation that makes it admissible to these sorts of contexts. The only possible choice for this is to have the same alignment, size and representation as void* and to force the representation of nullptr to the same bit-pattern as null pointers of type void*.

To enable the use for ... functions, we then just add another exception for va_arg, namely that the behavior is well-defined if an object of nullptr_t is re-interpreted as void* or char*, for example. Because of our choice for the representation, this is easily possible.

Enable `nullptr` in “Boolean” contexts

Pointers are often used in contexts that have a “Boolean” interpretation, such as if statements, ternary expressions or conversions to bool. In C++ this is also possible for nullptr so we enable this feature explicitly for all contexts that have such a “Boolean” interpretation. Note that for C++ this is much easier, because there these context are all handled by implicit conversion to bool.

Here, for C, we have to do a little bit more work and have to define conversion to bool and equality comparison separately. For other contexts, the integration is then simply done by adding nullptr_t to the types that are permitted in addition to scalar types.

Impact

Since nullptr and nullptr_t are new features, there is no impact for existing code that does not use them.

Code that starts using nullptr_t for interfaces (either as function parameters or via _Generic) will not encounter direct incompatibilities with existing code, because the type didn’t exist before.

Using nullptr itself for the assignment or initialization of variables or as arguments to pointer parameters will work seamlessly; nullptr converts implicitly to any pointer type, much as NULL or any of the current null pointer constants. Eventually, changing the use of NULL for nullptr might detect the misuse of that feature in a context where an integer is expected. This is intended and considered to be an improvement.

Using nullptr for calls into macros that implement type-generic interfaces may encounter incompatibilities. In particular, for interfaces that perform type inspection by means of _Generic the new type nullptr_t of the constant may not fit any of the choices. But, in general this means that the code was not robust when presented with null pointer constants of varying type (integer type, void*) before. In general these problems will result in constraint violations, and thereby give the opportunity to improve the code receiving the nullptr argument with respect to these aspects. This is consistent with the Charter, which states that if there are to be changes, they should strive to be diagnosed rather than perform silent changes in behavior.

Using nullptr for calls into functions with ... will improve situations that had been undefined before. In particular, nullptr can be used as a sentinel for a list of pointers to void or character type, which is not portable when using NULL.

C library implementors would have to add the type nullptr_t to their <stddef.h> header. This can be achieved similar to the following, where 2023MML is the __STDC_VERSION__ number chosen for C23.

#if __STDC_VERSION__ >= 2023MML
typedef typeof(nullptr) nullptr_t; // C23 supports typeof and nullptr
# define __STDC_VERSION_STDDEF_H__ 2023MML
#endif

Prior art

The concept to present a null pointer constant as a keyword that is tightly integrated into the language as is proposed here is present in most other programming languages that have the concept of pointers, for example Pascal, Lisp, Smalltalk, Ruby, Objective-C, Lua, Scala, or Go, often with other spellings such as nil, NIL, None, null or Null. The fact that C still does express this concept with other language features is a rare exception in this picture and only a historic artefact and not a necessity.

The nullptr feature together with nullptr_t is present in C++ since C++11 and has extensive implementation and application experience in that framework. This feature is also given under a different name in the Plan 9 C compiler, named nil. It approximates some of the features provided below, but not all of them.

C users often shift between using literal 0 versus (void*)0 for a library-deployed, macro-based definition. There are various tradeoffs for doing this (discussed as part of the design decisions above) that can make this have undesirable behaviors and qualities. Recently, users have tried to move away from their own personal definitions for portability and correctness reasons.

Proposed wording

Changes are proposed against the wording in C23 draft n2731 to which the accepted changes concerning keywords have been added. Green and underlined text is new text.

Boolean type (6.3.1.2)

Add to the end of p1

When a nullptr_t value is converted to bool, the result is false.

Pointers (6.3.2.3)

Change p3

3 An integer constant expression with the value 0, or such an expression cast to type void*, or the predefined constant nullptr, ~~is called~~are a null pointer constant.⁶⁸⁾ If a null pointer constant or a value of type nullptr_t (which is necessarily the value nullptr) is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.

Keywords (6.4.1)

Add nullptr to the lists in p1.

Predefined constants (6.4.4.5)

Add nullptr to the list of predefined constants and a new paragraph to the description

The keyword nullptr represents a null pointer constant. Details of its type are described in 7.19.x.

Equality operators (6.5.9)

Add two items to the list of constraints in p2

– both operands have type nullptr_t;

– one operand has type nullptr_t and the other is a null pointer constant;

Add to the end of p5

If both operands have type nullptr_t or one operand has type nullptr_t and the other is a null pointer constant, they compare equal.

By that a comparison of values of type nullptr_t to 0 (similar as for pointers seen as null pointer constant) is always well defined.

Contexts that interpret an expression as Boolean

To be easily compatible to current uses of NULL we add the nullptr_t to all contexts that traditionally allow to interpret a pointer as a Boolean value. The particular result for using nullptr or an lvalue of type nullptr_t (that might not be a null pointer constant but just a null pointer) can then be deduced from the equality operators much as this is done for pointer types.

Unary arithmetic operators (6.5.3.3)

1 The operand of the unary + or - operator shall have arithmetic type; of the ~ operator, integer type; of the ! operator, scalar or nullptr_t type.

Logical AND operator (6.5.13)

2 Each of the operands shall have scalar or nullptr_t type.

Logical OR operator (6.5.14)

2 Each of the operands shall have scalar or nullptr_t type.

Conditional operator (6.5.15)

2 The first operand shall have scalar or nullptr_t type.

3 One of the following shall hold for the second and third operands:^FNT1)

^FNT1) If a second or third operand of type nullptr_t is used that is not a null pointer constant, a constraint is violated.

The `if` statement (6.8.4.2)

1 The controlling expression of an if statement shall have scalar or nullptr_t type.

Iteration statements (6.8.5)

2 The controlling expression of an iteration statement shall have scalar or nullptr_t type.

The `nullptr_t` type (7.19.x)

Add to 7.19 p2

nullptr_t

which is the type of the nullptr predefined constant, see below;

And add a new clause 7.19.x to the <stddef.h> header

7.19.x The nullptr_t type

Description

1 The nullptr_t type is the type of the nullptr predefined constant. It has only a very limited use in contexts where this type is needed to distinguish nullptr from other expression types. It is an unqualified complete object type that is neither an atomic, scalar or array type and that has one value, nullptr. Default initialization of an object of this type is equivalent to an initialization by nullptr.

2 The size and alignment of nullptr_t is the same as for a pointer to character type. An object representation of the value nullptr is the same as the object representation of a null pointer value of type void*. An lvalue conversion of an object of type nullptr_t with such an object representation has the value nullptr; if the object representation is different, the behavior is undefined.^FNT0)

^FNT0) Thus, during the whole program execution an object of type nullptr_t evaluates to the assumed value nullptr.

3 NOTE Because of the restrictions on the type category, the use of values of this type in expressions is implicitly constrained in many ways throughout clause 6, in particular for arithmetic. Exempted from such constraints are uses, for example,

as the operand of an alignas, sizeof or typeof operators,

as the operand of an implicit or explicit conversion to a pointer type,

as the assignment expression in an assignment or initialization of an object of type nullptr_t,

as an argument to a parameter of type nullptr_t or in a variable argument list,

as a void expression,

as the operand of an implicit or explicit conversion to bool,

as an operand of a _Generic primary expression,

as an operand of the !, &&, || or conditional operators, or

as the controlling expression of an if or iteration statement.

The `va_arg` macro (7.16.1.1)

Modify the end of p2

If type is not compatible with the type of the actual next argument (as promoted according to the default argument promotions), the behavior is undefined, except for the following cases:

– both types are pointers to qualified or unqualified versions of compatible types;

– one type is a signed integer type, the other type is the corresponding unsigned integer type, and the value is representable in both types;

– one type is pointer to qualified or unqualified void and the other is a pointer to a qualified or unuqualified character type.;

– the type of the next argument is nullptr_t and type is a pointer type that has the same representation and alignment requirements as a pointer to a character type.^FNT1)

^FNT1) Such types are in particular pointers to qualified or unqualified versions of void.

Note to the editors: Please, observe the typo corrected above. The readability of 7.16.1.1 could gain by renaming the macro parameter type to something like T.

Editorial changes

There are several other editorial changes that can be done in that context. We leave them discretion of the editors.

Usage of `NULL`

There are several usages of the macro NULL throughout the library clause of the form (char**)NULL which would probably better be replaced by nullptr without cast.

Forward references

For several of the places where this document proposes changes, forward references to 7.19.x either directly in the text or as separate paragraph at the end of the respective clause could be needed.

Undefined behavior – Annex J

Using a null pointer constant in form of an integer expression as argument to a ... function and then interpret it as void* or char* is undefined behavior. This could be added to Annex J as entry for va_arg (7.16.1.1)
A specific entry for nullptr_t (7.19.x) could be made that stipulates that arbitrarily changing or copying from a non-null pointer value into a nullptr_t object and then reading that object has UB.

Feature test macro

This paper proposes a change to the <stddef.h> header, so this header now needs a test macro __STDC_VERSION_STDDEF_H__. During the transition to C23, this would help users to determine if the nullptr_t type is available for their current version of the C library:

#include <stddef.h>
#if __STDC_VERSION_STDDEF_H__ > 0
/* all is fine, we should also have have nullptr */
#elif __STDC_VERSION__ > 202300L
typedef typeof(nullptr) nullptr_t; // C23 supports typeof and nullptr
#else
# error "nullptr_t is missing"
#endif

Questions to WG14

Does WG14 want to integrate the changes of N2978 into C23?

Acknowledgements

Many thanks to Joseph Myers for the very detailed review and feedback for earlier versions of this paper.