constexpr coroutines
This paper is proposing making coroutines functional during constant evaluation. Even when most of use-cases for coroutines are based on I/O and event based, coroutines are still useful for compile-time based computation, eg. std::generator
.
Changes
- R1 → R2: Updated wording to elide allocator functionality in coroutine promise types. Added discussion about alternative implementation strategies. Added information about decision by various groups which seen this paper.
- R0 → R1: Changed wording mentioning lifetime of coroutines after expr.const 5.2
Timeline
EWG in Wroclaw
SF | F | N | A | SA |
---|---|---|---|---|
4 | 12 | 3 | 0 | 0 |
Result: Consensus
LEWG in Wroclaw
SF | F | N | A | SA |
---|---|---|---|---|
12 | 5 | 0 | 0 | 0 |
Result: Unanimous consent
CWG in Wroclaw
Multiple implementers expressed concerns about resources needed for the implementation of this feature, where the resources might be better spent implementing C++ features where market demand is higher.
EWG is requested to reconsider putting this feature into the standard.
Note: CWG did review of the wording and only question was about what to do with allocation / deallocation functions. Wording was updated to elide these calls during constant evaluation of a coroutine. Concern about resources was reflected by updating the paper to contain informations about alternative implementation strategies.
Based on anecdotal evidence people don't use coroutines for two reasons: one is missing standard library support (which is partially resolved with std::generator
) and second is mutual exclusivity with much more popular constant evaluated code. Having constant evaluation of couroutines will help alleviate this and I do expect seeing libraries implementing lazy coroutine based parsers.
Quote
Well, you just told me coroutines are the best way to solve some problems, so wouldn't I also want to use the Best Way at compile time? (quote from Jason Turner, co-author of "constexpr all the things" talk)
Simple example
When this paper is merged into the standard, users will be able to use std::generator
-like coroutines to generate or calculate data.
template <typename T> constexpr auto fib() -> std::generator<T> {
T a = 0;
T b = 1;
co_yield a;
do {
co_yield b;
auto tmp = b;
b += a;
a = tmp;
} while (true);
}
template <typename T, size_t N> constexpr auto calculate_fibonnaci() {
auto res = std::array<T, N>{};
std::ranges::copy(fib<T>() | std::views::take(N), res.begin());
return res;
}
constexpr auto cached_fibonnaci = calculate_fibonnaci<unsigned, 20>();
Evaluation of constexpr coroutines
Implementation must make sure to avoid stack exhaustion and store evaluation state and coroutine's local variables in a way to avoid it. By simply resuming coroutine and evaluating it on system stack (in case of AST walking implementations) will lead to it.
This is because coroutines can be interlieved and their state must be maintained to be resumed. The stack then contains mixed coroutines and normal code together. To avoid this situation the easiest way to model a coroutine is to use coroutine (or coroutine-like functionality) to store the state (represented with values on stack) somewhere else. Because AST walk is unbounded, obvious first choice is a stackfull coroutine (fiber, not a thread!).
Lifetime of coroutine is bounded to be only within constant evaluation similary as memory allocation. Any coroutine leaking outside boundaries of constant evaluation means whole constant evalution failed.
Implementation experience
Partially implemented in clang available on my github, implementation should be ready for its presentation at Wroclaw meeting, and also will be soon available on compiler explorer (thanks Matt!).
Most of functionality needed was already present in Clang, it was mostly about removing checks in the parser.
Another part was implementing the functionality in Clang's interpreter and there I needed to add fibers (stackfull coroutines) as the interpreter recursive walks over AST. Ability to save interpreter's stack content did minimize impact of the change to only resuming, suspending, and variable storage and life-time management.
At the end of evaluation the interpret needs to check objects holding fibers if there is still any coroutine not released, if there is it report similar error as when there is an unreleased memory allocation.
Hardest problem was implementing local "stack", as createLocal
function was designed around idea of having only one branch of evaluation. This I solved by providing context of currently evaluated coroutine in EvalInfo
and switching it on every suspension / resume of a coroutine.
Alternative implementation approaches
The implementation is using fibers (not threads!) to create a storage for execution of coroutines outside of main stack. Main requirement here is to make sure jumping between coroutines won't run out of the system stack.
Following subpart of paper shows alternative approaches which can model coroutines. All of them have same expressive ability and model coroutines well. These implementation has various advantages and costs. For every model of constant evaluation should be possible found good and usable match.
Byte-code interpreter
Clang implementation is starting to use a byte-code virtual machine based interpreter. In such implementation the implementation approach is identical for actually compiling coroutines to a native code. Only thing you need is ability to change current stack / or reference coroutine variables based on offset to coroutine frame.
Such interpret has many advantages over AST walking one mostly in speed and security. I was told initial measurements in the byte-code interpreter in clang are three orders of magnitude faster over recursive AST walking. Security is inherit to fact of not using system stack for the evaluation at all.
C++ coroutines
For compilers which are using C++20 code itself another implementation strategy is to not use the system stack for walking AST, but change all necessory functions into coroutines which would model ordinary functions with only one return point to its callee. But these functions can be suspended and evaluation can switch to evaluation of a constexpr coroutine. Because all coroutines will be allocated on heap, this also hardens implementation's constexpr evaluator.
AST transformation / split
Last feasible approach is to implement a transformation of coroutine function, into multiple trees, representing all possible paths which can be taken from each suspension point. Everytime such splitted coroutine is suspended, constexpr evaluator can unroll all its recursive functions. Implementation then needs to implement a storage for coroutine local variables which survive suspension. Hardest part is to implement special AST nodes to represent expressions split for suspension in middle.
Note: similar split is already done in compiler (for Clang at LLVM level, GCC just before leaving frontend, for Swift in its frontend)
Comparison
Implementation | Storage for | Area needed to be changed | |
---|---|---|---|
Variables | State | ||
Stackfull coroutines | overlay over local variables | fiber (alternative stack on a heap) | builtins to control coroutine flow, execution state to keep track of allocations |
Byte-code | coroutine frame | instruction pointer | control of instruction pointer |
Stackless coroutines | overlay over local variables | suspended coroutines | convert all AST walking functions into coroutines |
AST transformation | coroutine frame | splitted AST trees | add special AST nodes to glue expressions across suspensions |
Impact on existing code
None, this is a pure extension, it allows code to be constexpr which wasn't case before.
Intention for wording changes
Remove all obstacles blocking coroutines from being constant evaluatable. Make sure all coroutines are destroyed at end of constant evaluation.
Proposed wording changes
7.7 Constant expressions [expr.const]
- this ([expr.prim.this]), except
- in a constexpr function ([dcl.constexpr]) that is being evaluated as part of E or
- when appearing as the postfix-expression of an implicit or explicit class member access expression ([expr.ref]);
- a control flow that passes through
a declaration of a block variable ([basic.scope.block]) with
static ([basic.stc.static]) or
thread ([basic.stc.thread]) storage duration,
unless that variable is usable in constant expressions;
[Example 1: constexpr char test() { static const int x = 5; static constexpr char c[] = "Hello World"; return *(c + x); } static_assert(' ' == test()); — end example]
- a construction of a coroutine promise object ([dcl.fct.def.coroutine]), unless the coroutine promise object is destroyed within the evaluation of E;
- an invocation of a non-constexpr function;68
- an invocation of an undefined constexpr function;
- an invocation of an instantiated constexpr function that is not constexpr-suitable;
- an invocation of a virtual function ([class.virtual]) for an object whose dynamic type is constexpr-unknown;
- a call to an instance of std::allocator<T>::allocate ([allocator.members]), unless the allocated storage is deallocated within the evaluation of E;
- a call to an instance of std::allocator<T>::deallocate ([allocator.members]), unless it deallocates a region of storage allocated within the evaluation of E;
- an await-expression ([expr.await]);
- a yield-expression ([expr.yield]);
- a three-way comparison ([expr.spaceship]), relational ([expr.rel]), or equality ([expr.eq]) operator where the result is unspecified;
- a throw-expression ([expr.throw]);
9.2.6 The constexpr and consteval specifiers [dcl.constexpr]
- it is not a coroutine ([dcl.fct.def.coroutine]), and
- if the function it is a constructor or destructor whose class has its class does not have any virtual base classes.
- an invocation of a constexpr function can appear in a constant expression ([expr.const]) and
- copy elision is not performed in a constant expression ([class.copy.elision]).
9.5.4 Coroutine definitions [dcl.fct.def.coroutine]
- If the search finds any declarations, overload resolution is performed on a function call created by assembling an argument list.The first argument is the amount of space requested, and is a prvalue of type std::size_t.The lvalues are the successive arguments.If no viable function is found ([over.match.viable]), overload resolution is performed again on a function call created by passing just the amount of space required as a prvalue of type std::size_t.
17.12 Coroutines [support.coroutine]
17.12.1 General [support.coroutine.general]
17.12.2 Header <coroutine> synopsis [coroutine.syn]
17.12.3 Coroutine traits [coroutine.traits]
17.12.3.1 General [coroutine.traits.general]
17.12.3.2 Class template coroutine_traits [coroutine.traits.primary]
17.12.4 Class template coroutine_handle [coroutine.handle]
17.12.4.1 General [coroutine.handle.general]
17.12.4.2 Construct/reset [coroutine.handle.con]
constexpr coroutine_handle() noexcept;
constexpr coroutine_handle(nullptr_t) noexcept;
static constexpr coroutine_handle from_promise(Promise& p);
constexpr coroutine_handle& operator=(nullptr_t) noexcept;
17.12.4.3 Conversion [coroutine.handle.conv]
constexpr operator coroutine_handle<>() const noexcept;
17.12.4.4 Export/import [coroutine.handle.export.import]
constexpr void* address() const noexcept;
static constexpr coroutine_handle<> coroutine_handle<>::from_address(void* addr);
static constexpr coroutine_handle<Promise> coroutine_handle<Promise>::from_address(void* addr);
17.12.4.5 Observers [coroutine.handle.observers]
constexpr explicit operator bool() const noexcept;
constexpr bool done() const;
17.12.4.6 Resumption [coroutine.handle.resumption]
constexpr void operator()() const;
constexpr void resume() const;
constexpr void destroy() const;
17.12.4.7 Promise access [coroutine.handle.promise]
constexpr Promise& promise() const;
17.12.4.8 Comparison operators [coroutine.handle.compare]
constexpr bool operator==(coroutine_handle<> x, coroutine_handle<> y) noexcept;
constexpr strong_ordering operator<=>(coroutine_handle<> x, coroutine_handle<> y) noexcept;
17.12.4.9 Hash support [coroutine.handle.hash]
template<class P> struct hash<coroutine_handle<P>>;
17.12.5 No-op coroutines [coroutine.noop]
17.12.5.1 Class noop_coroutine_promise [coroutine.promise.noop]
struct noop_coroutine_promise {};
17.12.5.2 Class coroutine_handle<noop_coroutine_promise> [coroutine.handle.noop]
17.12.5.2.1 General [coroutine.handle.noop.general]
17.12.5.2.2 Conversion [coroutine.handle.noop.conv]
constexpr operator coroutine_handle<>() const noexcept;
17.12.5.2.3 Observers [coroutine.handle.noop.observers]
constexpr explicit operator bool() const noexcept;
constexpr bool done() const noexcept;
17.12.5.2.4 Resumption [coroutine.handle.noop.resumption]
constexpr void operator()() const noexcept;
constexpr void resume() const noexcept;
constexpr void destroy() const noexcept;
17.12.5.2.5 Promise access [coroutine.handle.noop.promise]
constexpr noop_coroutine_promise& promise() const noexcept;
17.12.5.2.6 Address [coroutine.handle.noop.address]
constexpr void* address() const noexcept;
17.12.5.3 Function noop_coroutine [coroutine.noop.coroutine]
constexpr noop_coroutine_handle noop_coroutine() noexcept;
17.12.6 Trivial awaitables [coroutine.trivial.awaitables]
Feature test macros
15.11 Predefined macro names [cpp.predefined]
__cpp_constexpr_coroutines 2024??L
17.3.2 Header <version> synopsis [version.syn]
#define __cpp_lib_constexpr_coroutines 2024??L // also in <coroutine>