1. Effects of This Paper
|Status Quo||This Paper|
✅ Works as expected.
☠️ Undefined behavior
2. The Problem
... Two module-import-declarations import the same header unit if and only if their header-names identify the same header or source file. [ ... ] A declaration of a name with internal linkage is permitted within a header unit despite all declarations being implicitly exported. If such a declaration declares an entity that is odr-used outside the header unit, or by a template instantiation whose point of instantiation is outside the header unit, the program is ill-formed.
This requires a single header unit for each imported header and prohibits using internal linkage entities from that header unit.
The first problem with this is that requring a single header unit in a program
for a given header file borders on unimplementable. Some compiler invocation
has to produce that header unit, but there’s no reasonable way to decide which
one. For example, two different libraries developed by two unrelated groups use
a C library libO. Both of them
which includes a
static local variable with a dynamic initalizer which observes how many
times it was called. A third group of developers comes along and tries to use
both of these libraries in the same program. How do you ensure that static local
variable is only initialized once? libO is a C library written before
C++20 modules existed, and so it doesn’t expect to need to build the header unit
for O/functionality.h, so it can’t be responsible. The two other libraries don’t
even know about each other, so neither one of those can assume it can build it.
The third group doesn’t even know this header exists as it’s an internal
implementation detail of the other two libraries.
The end result is there’s no sane way to implement the requirement that there’s a single header unit for each imported header or source file.
The second problem is that
functions are pretty common in
headers for two main reasons:
C’s inline semantics are different from C++, and would require a non-inline definition somewhere.
They provide ABI isolation as you’ll never get a different version from what the compiler saw. This means you can totally change how they work without fear of breaking ABI.
A search on github finds thousands of uses of static inline in C headers
including projects such as Wayland, OpenSSL, Clang builtins, mono. This
is also used pervasively on Apple platforms in the form of
. Additionally, almost all inline functions in libc++
on Apple platforms are
for ABI isolation reasons, even member
functions (via an extension). Even if both of these cases could be changed, just
fixing these isn’t enough. There are still at least 10s of thousands of
in the wild, and it would be unfortunate if they
were not usable as header units.
3. The Solution
Clang modules currently supports internal linkage entities by simply copying them into whatever translation unit imports the module. This works quite well as it’s the same semantics you would get by textually including the header the Clang module was build from. However, this may not be the best way to specify it.
Instead, this paper proposes making two changes. The first is to say that instead of there being a single header unit for each header or source file, there may be multiple, and it’s unspecified which one is imported. The second is that you may name imported internal linkage entities from header units with the same ODR implications as today.
The first item fixes the first problem above about not knowing who’s responsible for building the header unit, as now every TU that imports the header can build its own. This also fixes the issue with external linkage entities as you’ll likely end up with multiple definitions of them, which would be an error. Note that specifying it this was does not require that each importing TU build its own header unit. A sufficently advanced build sytem could still build a single one for the entire program.
The second item fixes the second problem above by removing the restriction on naming internal linkage entities. This is implementable by either copying the definition into the importing TU, or by doing linkage promotion, as the spec allows there to be one or more instances of any internal linkage entity in this case.
3.1. Undefined Behavior
This paper introduces two different types of UB. The first is that by allowing multiple header units to exist for the same header or source file without disallowing certain external linkage entities in header units, it allows for violating the rule that Every program shall contain exactly one definition of every non-inline function or variable …. While a violation of this rule does not require a diagnostic, in the majority of situations a user will get a multiple definition error. This could be made ill-formed instead, but it was felt it was simpler at this time to stay closer to header semantics.
The second type of UB is a more inconspicuous case. The one definition rule
(ODR) allows for multiple definitions of inline variables and functions, but
they must be the "same". Any inline definition that references an internal
linkage entitiy is a ODR violation waiting to happen. However,
functions are extremely
common today, and a large portion of C headers would not be usable as header
units if they were not allowed to be referenced. As header units exists purely to support existing
code and code that will likely never move to modules (like C code), it would
significantly harm modules adoption to not allow this. The consequences are the
same as using
from a textual header today. We’re not introducing
any new UB, just not stopping people from hitting it.
4. Ship Vehicle
This paper is targeting C++20 as many C headers are not importable without it.