Response to "Problems with reference

Response to "Problems with reference_closure"

ISO/IEC JTC1 SC22 WG21 N2839 = 09-0029 - 2009-03-03

Lawrence Crowl, crowl@google.com

This paper is a response to N2830 Problems with reference_closure. That paper is quoted in its entirety, with responses interleaved in the quotation. The sectioning derives from N2830.

Introduction

The lambda proposal as voted into the working draft requires that if a lambda captures all of its local variables by reference, the closure class must be derived from std::reference_closure. This feature was provided to allow the user to write functions that can take as arguments any suitable lambda.

The feature was provided to improve performance of non-template functions over a common form of closures.

Now that we have additional implementation experience, we believe that the requirement that lambdas derive from std::reference_closure is over-specification, does not provide additional optimization opportunities, can in fact constrain implementations from certain optimizations, and ends up providing two incompatible mechanisms for passing lambdas to other functions.

I show below that no optimization opportunities are lost, and that the mechanisms are compatible.

std::reference_closure vs. std::function

The working draft already provides a powerful, general-purpose mechanism that can be used to pass lambdas to functions: std::function. std::reference_closure provides a limited subset of the functionality of std::function and will result in the use of two incompatible mechanisms, complicating interfaces and reducing interoperability. ...

It is true that std::reference_closure has a subset of the functionality of std::function, but that subset is generally a reflection of the fact that the compiler alone creates non-null instances of std::reference_closure and that std::reference_closure needs no allocation and hence no move operations. The other common operations, particularly copy and call, are supported in both types.

Operationally, as parameters, the two types will mostly be used the same way. The intent is that one be able to overload a function taking std::function with a function taking std::reference_closure. For nearly all uses of std::function in this context, a generic implementation of those overloads is possible.

Furthermore, one can convert an std::reference_closure to an std::function or wrap an std::function in an std::reference_closure to obtain full interoperability. (The latter seems unlikely to be needed.)


std::reference_closure<int(int)> rc1 = [&](int i){ .... };
std::function<int(int)> f( rc1 );
std::reference_closure<int(int)> rc2 = [&](int i){ return f(i); };

So, the mechanisms are compatible.

... In his national body comments, Doug Gregor wrote:

std::reference_closure is a premature optimization that provides a limited subset of the functionality of std::function intended to improve performance in a narrow use case. However, the "parallel application performance" benchmark used to motivate the inclusion of std::reference_closure was flawed in several ways:

it failed to enable a common optimization in std::function (implemented by all vendors), exacting a large and unrealistic penalty for copying std::function instances, and

The benchmark was implemented using a stock GCC, which leads one to conclude that the optimization was not so common. However, we can anticipate that the optimization will be common by the time C++0x is supported by compilers.

Even given the optimization, there is still an order-of-magnitude overhead difference between std::reference_closure and std::function.

it failed to account for parallel scheduler overhead or realistically-sized work units, both of which would dominate the costs measured by the benchmark in any realistic application.

The "parallel application performance" benchmark wasn't such a benchmark; it was a "control abstraction penalty" benchmark. Such a benchmark is motivated by the higher demands that highly parallel applications make on control abstraction. In a highly parallel application, most closures will be executed in a serial context, and hence the serial performance of closure can have a significant effect on parallel application performance. So, saying the benchmark failed to account for parallel overhead is much like saying that function inlining is unnecessary because the overhead of system calls for I/O would dominate any performance gained from inlining.

The efficiency of C++ abstraction mechanisms is a key differentiator between C++ and other programming languages.

Over-constraining Implementations

Most lambdas that would be required to derive from std::reference_closure will not in fact ever be passed to a routine that takes a std::reference_closure. The requirement that such lambdas derive from std::reference_closure actually makes it more difficult to optimize some common cases. A reference-only lambda requires at most a single pointer to access the stack information, but std::reference_closure forces an implementation to use two pointers. That penalizes the very common case where a lambda is passed as a function object to a template, for example:
void f0x(std::vector<int>& v) {
   sort(v.begin(), v.end(),
        [](int x, int y) -> bool { return x > y; });
}

The current wording is unclear on whether or not the above lambda expression results in a reference closure. N2825 Unified Function Syntax provides a clarification that explicitly avoids reference closure in preference to a native function.

The requirement that the lambda derive from std::reference_closure turns what would be an empty class into something with two pointers. That could, conceivably, make the use of the lambda slower than today's function-object version:
void f98(std::vector<int>& v) {
    sort(v.begin(), v.end(), std::greater<int>());
}

So, in this example, given the change in N2825, the two functions would have identical performance.

However, that observation side steps the fundamental contention that a reference-only lambda expression has closure implementations that are smaller and cheaper as template parameters than std::reference_closure.

Suppose that the assertion is true. Why then is the (worst-case) factor of two difference in overhead here more important than the (demonstrated) factor of ten difference in overhead between std::function and std::reference_closure?

Without the std::reference_closure requirement, all of the uses of a given lambda are always within a given translation unit (including template instantiations from that translation unit). This means that a given implementation is free to implement lambdas in any way it chooses provided it follows the working paper definition using the "as if" rule. With std::reference_closure, however, the implementation details of the reference closure mechanism become part of the ABI. This means that implementations can't do better than what is specified in the ABI. ...

On the contrary, the implementations can do better. In particular, modern compilers (with appropriate options) often optimize function calls with constant arguments by first cloning the target function and then propogating the constant arguments through to their uses within the body of the function. In the case of std::reference_closure, the argument has one field that is constant and the other that is variable. Some compilers might just work as-is, others might require modification to split apart the two fields. The preconditions on such an optimization are exactly those of template instantiation. So, the template optimization proposed does not disappear with the std::reference_closure requirement, it simply shifts to another part of the compiler. Thus, std::reference_closure provide no inherent performance loss over an unconstrainted implementation.

The natural question at this point is "if the compiler can do it for std::reference_closure, why can't it do it for std::function?" The answer is that it can. However, the problem is harder for two reasons. First, the std::function type is defined by a replacable library whereas the std::reference_closure type is defined by the compiler, and compiler writers are more likely to optimize something they control than something that they do not control. (Hence, interestingly, the proposal above!) Second, std::function has a conditional implementation, which increases the complexity of propogating the information through the target function, thus making implementation more difficult and compile more slowly. Thus, we can expect compiler optimization of std::reference_closure well before that of std::function.

However, such compiler optimizations are not applicable across opaque binary interfaces. It is precisely for this purpose that std::reference_closure exists. In summary, std::reference_closure provides no performance loss relative to the unconstrained closure implementation, and provides a performance gain relative to std::function in binary interfaces.

... It also means that some implementation techniques (such as generating C code as the output of a C++ compiler) will probably not be able to generate code that is compatible other compilers.

The std::reference_closure specification places no requirement on the internals of the call operator. In particular, the only ABI constraints are on the representation of the reference closure (a function pointer and frame pointer in some order) and the mechanism of passing the frame pointer to the function (make it an implicit first parameter). The mechanisms used to access variables in the static scope from within he body of the call operator are entirely contained within a single object file, and thus not part of the ABI. In particular, a generate-to-C strategy consists of wrapping the appropriate variables in the outer scope into a struct and then passing a pointer to the struct as if it were a frame pointer. This implementation is precisely the one that I used in implementing reference closures twenty years ago. Full back-end compiler support can be implemented later in a fully binary-compatible way. As compiler interoperability problems go, this one is small.

Existing implementations

Of the four implementations that we know of, none has implemented this feature. This suggests that the feature presents implementation challenges and lacks compelling demand from users (or that vendors can satisfy that demand without using a feature like this).

Reference closures have decades of implementation experience. They are used in Pascal, Ada, and in GNU C. (Unfortunately, the latter has an unsafe implementation driven by a desire to be compatible with function pointers.) Many compilers provide direct support in the front-end–back-end interface for such constructs. Even when such support is not available, direct-to-C compilation strategies in the front end are available. Any lack of implementation in C++ is more a reflection of implementation priorities than of technical difficulties.

Working Paper Changes

Remove 5.1.1 [expr.prim.lambda] paragraph 12.
Remove 20.7.18 [func.referenceclosure] (and its subsections).

We have evidence of an order-of-magnitude overhead difference between std::reference_closure and std::function in binary interfaces. Avoiding that overhead is worth paying for.