|Reply to:||Hans-J. Boehm|
Unfortunately, the C library, as defined in the first CD, does not address a number of library issues that arise as the result of the introduction of threads. I believe that it is critical to address these issues, since little portable code can be written without addressing several of them. And this is reflected in several national body comments.
This is our initial attempt to do so in the main library sections of the standard. I did not go through annex K carefully. But it may be fine as is.
As observed below, annex K potentially offers solutions to some of the problems identified here. Unfortunately, it does not do so consistently, and it is not required to be implemented on all platforms supporting threads. Thus it cannot be leveraged to address threads issues as is.
This proposal follows Posix whenever possible. Both due to time constraints, and because reflector discussions indicated controversy about the overall direction of the solution, I often do not include precise C standard wording in cases in which Posix already provides the necessary specifications. It should be fairly straightforward to derive precise wording from the Posix specification.
Although there are cases in which the Posix solutions are not clearly technically optimal, I feel that even in the controversial cases, specifically implicit locking for I/O operations, the Posix approach is sufficiently well-established practice, even on non-Posix systems, that any other solution is inconsistent with current practice, and not viable, at this stage. I discuss this in a bit more detail in the appropriate section below.
Michael Wong and Jim Thomas provided helpful comments on an earlier draft of this paper.
Add the sentence
as the second-to-the-last sentence of 7.1.4p5. Note that library routines implemented in C should naturally satisfy this constraint. Assembly language code can usually read additional data without making that visible to the user. That remains allowed by the "as if" rule. Writing additional data, even if the original values are rewritten, causes real bugs and needs to be prohibited.
The descriptions of the first three functions specify that calls may introduce data races, though they say nothing about when this may happen. They leave the programmer with no obvious viable alternatives to use in a multithreaded program. For example, a library writer has no way to safely invoke rand, since the library provides no convention for protecting such calls with a common lock. There is no way to preclude simultaneous rand calls by other libraries from other threads.
strerror may be a somewhat special case, both in that it is unclear to me whether strerror itself could be made thread-safe as strerror_l already is for Posix, and in that the optional annex K strerror_s already appears to provide the same functionality, but with a different argument order. A Google code search suggests that strerror is used by far the most frequently (about 500K uses), with strerror_r far behind it (about 2500 uses) and strerror_s far behind that (about 100 uses). This suggests that making strerror itself thread safe would be clearly the best solution, if technically feasible. Here I assume it is not, but it makes sense to introduce strerror_r, since it is the most widely used thread-safe alternative.
strtok also has an annex K version, strtok_s that may be intended to be thread-safe, and thus could possibly serve as a replacement for strtok_r. However, the description in K.18.104.22.168 is unclear. It talks about sequences of calls, in which the last argument must remain the same. The example implies that it should be possible to have multiple such sequences in progress at once, which should also make it possible to use it from multiple threads. But that wouldn't otherwise have been my reading of the normative text. A Google code search for strtok_s turns up few hits, and the top ones seem inconsistent with the annex K specification, in that they have only three arguments. The strtok_r function is far more established.
The asctime function has very similar issues. But since it is defined in terms of an implementation, no further text is needed to address data races.
Add Posix functions strerror_r, strtok_r, rand_r, and asctime_r.
In 22.214.171.124p6, replace
The strtok function is not required to avoid data races.
In 126.96.36.199p3, replace
The strerror function is not required to avoid data races.
In 188.8.131.52p3, replace
The rand function is not required to avoid data races.
Add a very similar paragraph after 184.108.40.206p2:
Possible strerror alternatives:
Possible strtok alternatives:
Possible asctime alternatives:
It needs to be clear that allocation and deallocation functions implicitly avoid data races on the underlying heap, and that a modification of a memory location p followed by a deallocation of p followed by a reallocation and access of the same memory location p in another thread, do not introduce a data race on p. At the same time, we need to ensure that allocating p in one thread, and deallocating it in another, without intervening synchronization, remains a data race. (Doing this requires memory_order_relaxed atomic operations, but is possible.) It is hard to interpret the current specification as satisfying all these constraints. The clarification must allow thread-local allocation caches and should require malloc calls to order memory accesses only where absolutely necessary. Requiring all of these calls to acquire and release a particular lock, for example, would be an overconstraint, since it would give malloc some fence-like properties, which may be expensive to enforce.
Note that getting this right is subtle, and has significant impact on static analysis of code that calls memory management functions.
Insert the following after 7.22.3p1:
Current practice is that all currently specified stdio calls implicitly acquire a lock on the accessed stream, and that additional functions are provided to
I propose to follow this widely established precedent here. Mailing list discussions suggested that it is controversial. On the other hand, it also appeared to confirm the fact that all major platforms currently follow our proposed path.
This is a rough initial attempt at wording to follow Posix locking conventions.
Add Posix functions flockfile, funlockfile, getc_unlocked, putc_unlocked, getchar_unlocked, putchar_unlocked.
Insert the following paragraphs after 7.21.2p6: