Document Number: N2208
Submitter: Martin Sebor
Submission Date: March 26, 2018
Subject: Library Functions And Compound Literals

Summary

A discussion of N2145 exposed a general problem with the interaction between function-like macros defined for standard library functions and passing compound literals as arguments to such functions. Specifically, in §7.1.4 Use of library functions C states that:

Any function declared in a header may be additionally implemented as a function-like macro defined in the header, ….

Historically, implementations have made use of this technique to provide more efficient versions of "hot" library functions that avoid the overhead of a function call. The technique predates the introduction of the inline keyword and so is arguably less relevant today, but it is nevertheless still in wide-spread practice at least for a subset of APIs. A common example where this approach is still used is the character classification macros defined in the <ctype.h> header, and some I/O functions declared in <stdio.h>. Below is an implementation of one of these macros from the GNU C library:

	# define isdigit(c)     __isctype((c), _ISdigit)
The definition of __isctype is not important for this discussion.

Because isdigit is macro that takes a single argument and because the preprocessor does not interpret higher-level constructs in the language to be able to distinguish between commas separating macro arguments from those separating initializers for members in a compound literal, isdigit cannot be invoked with a compound literal argument that consists of multiple initializers. For example, the following invocation of the macro is not valid and is rejected:

	struct Pair { int first, second; };
	extern struct Pair *p;
	
	isdigit ((struct Pair){ '1', 2 }.first);   // too many arguments to macro isdigit()

The example above is admittedly contrived and unlikely to have a significant impact in practice. However, more realistic use cases exist that are likely to have such impact. They include functions that take a pointer argument to which the address of a temporary object — a compound literal — is passed. For instance, AIX defines memcpy as a macro to force inlining:

	/*
	*   The following macro definitions cause the XLC compiler to inline
	*   these functions whenever possible.
	*/

	#ifndef __cplusplus
	#ifdef __STR__
	…
	#       define memcpy(__s1,__s2,__n) __memcpy(__s1,__s2,__n)
	…
As a result, the following call to memcpy is rejected as invalid by the XLC compiler:
	memcpy (p, &(struct Pair){1, 2}, sizeof *p);
Another example from the standard library involves the asctime function that takes a const struct tm* argument. When the function is implemented as a macro, code like the following is rejected:
	char *str = asctime (&(struct tm){
	  .tm_year = 2018, .tm_mon = 3, .tm_mday = 26, …
	});

Beyond this original use case, using macros to define standard library APIs has gained increased relevance with the advent of generic programming in C, specifically with the introduction of the generic selection feature (_Generic) in C11. A number of library functions are specified to take arguments of multiple distinct types and made to act as "overloads" of the same name. Besides the atomic functions declared in <stdatomic> and discussed in N2145, other examples of such "overloaded" APIs include the type-generic math functions defined in <tgmath.h>. Although other techniques for achieving this effect are possible, all existing implementations make use of macros to provide these overloads. Consequently, neither of the calls in the examples below is valid:

	atomic_store_explicit (p, (struct Pair){ 1, 0 }, memory_order_weak);   // too many arguments to macro
or
	int e = exp ((struct Pair){ 123, 456 }.first);   // too many arguments to macro
The main difference between the two is that unlike the math overloads which are explicitly specified to be implemented using macros, the atomic APIs are specified as functions.

Proposed Resolution

It might be argued that because this problem is not new — it has existed since C99 — and because it has not been reported sooner, it is not impeding the portability of C programs between implementations. This may, in fact, be the case in some of the APIs used in the contrived example above (isdigit), but it is considerably less likely so for memcpy. It is also possible that despite having been in the language for well over a decade, compound literals aren't used nearly enough to cause a significant problem here. However, we believe that the introduction of atomics makes this a pressing problem that, if not resolved, will have an adverse impact on the adoption of the atomic APIs for aggregate types. We believe this is so because the atomic APIs are the only mechanism to assign values to objects of atomic aggregate types (no initialization or ordinary assignment from non-atomic struct type is defined).

In light of the above, and to address the problem consistently for all library functions and their uses, we propose to remove the unqualified latitude for implementations to define any standard library functions as function-like macros and replace it with a few necessary exceptions to this rule. Outside the small set of explicitly specified exceptions, the permission to define library functions (also) as function-like macros is no longer necessary since inline functions offer a superior solution (commonly provided extensions such as GCC attribute always_inline completely obviate any benefits of using macros for this purpose, even in the absence of inlining as an optimization). The exceptions are limited to the existing type-generic math functions listed in <tgmath.h>, other existing library facilities explicitly specified as macros (e.g., the function-like macros defined in <setjmp.h> or in <stdarg.h>), and any other similar type-generic APIs yet to be introduced.

To this end, we propose to make changes as indicated below. In §7.1.3 Reserved identifiers, paragraph 1, bullet 5, make the following change:

Each identifier with file scope listed in any of the following subclauses (including the future library directions) is reserved for use as a macro name and as an identifier with file scope in the same name space if any of its associated headers is included.

Furthermore, in §7.1.4 Use of library functions, paragraph 1, make the following changes:

Each of the following statements applies unless explicitly stated otherwise in the detailed descriptions that follow: If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, or a pointer to non-modifiable storage when the corresponding parameter is not const-qualified) or a type (after promotion) not expected by a function with variable number of arguments, the behavior is undefined. If a function argument is described as being an array, the pointer actually passed to the function shall have a value such that all address computations and accesses to objects (that would be valid if the pointer did point to the first element of such an array) are in fact valid. Unless explicitly specified otherwise, no function declared in a library header may be defined as a macro. Any function declared in a header may be additionally implemented as a function-like macro defined in the header, so if a library function is declared explicitly when its header is included, one of the techniques shown below can be used to ensure the declaration is not affected by such a macro. Any macro definition of a function can be suppressed locally by enclosing the name of the function in parentheses, because the name is then not followed by the left parenthesis that indicates expansion of a macro function name. For the same syntactic reason, it is permitted to take the address of a library function even if it is also defined as a macro. 185) The use of #undef to remove any macro definition will also ensure that an actual function is referred to. Any invocation of a library function that is implemented as a macro shall expand to code that evaluates each of its arguments exactly once, fully protected by parentheses where necessary, so it is generally safe to use arbitrary expressions as arguments. 186) Likewise, those function-like macros described in the following subclauses may be invoked in an expression anywhere a function with a compatible return type could be called. 187)

and remove footnotes 185 through 187. Furthermore, in §7.25 Type-generic math <tgmath.h> modify footnote 313 as indicated below:

313) Like other function-like macros in Standard libraries, e Each type-generic macro can be suppressed to make available the corresponding ordinary function.

For a proposal to correct the problem for the atomic functions defined in <stdatomic.h> refer to N2145.