Document #: | P3117R1 |
Date: | 2024-12-14 |
Project: | Programming Language C++ |
Audience: |
SG-9 LEWG |
Reply-to: |
Zach Laine <whatwasthataddress@gmail.com> Jonathan Müller <jonathanmueller.dev@gmail.com> Barry Revzin <barry.revzin@gmail.com> |
transform_view
.In [P2017R1], we made
some range adaptors conditionally borrowed. But we didn’t touch adaptors
that had callables - like
views::transform
. It turns out
to be very useful to have a borrowable version of
views::transform
. Indeed, P2728R6
even adds a dedicated new range adaptor
(views::project
) which is simply
a version of views::transform
that can be borrowed (because its callable must be a constant).
But rather than add a dedicated view for this specific case, which
requires a new name but really only helps
views::transform
, we can
generalize views::transform
to
address the use-case in a way that would also help all the other range
adaptors that take callables. At the very least, in
views::transform(r, f)
if
r
is borrowed and
f
is empty, an implementation
can simply put f
in the transform_view<R, F>::iterator
directly (rather than a
transform_view<R, F> *
)
which would allow it to be borrowed. The same could be said for other
range adaptors that take callables as well, which seems like a more
useful approach as well as not requiring new names for every
adaptor.
The main question then is what the criteria should be for when
transform_view<R, F>
should be a borrowed range (when
R
is):
is_empty_v<F>
(range-v3 already does this - not for conditionally borrowed, but just
to decide whether to store the callable by value in the
iterator)
sizeof(F) <= sizeof(void *)
and
is_trivially_copyable_v<F>
(this means that when transforming with a function pointer, the function
pointer itself can live in the iterator - which takes the same amount of
space as the parent pointer, except with one less indirection)
something else?
This question is a little simpler for
views::transform
(which only
needs to potentially store f
in
the adapted iterator) than it is for
views::filter
(which would need
not only the predicate but also the underlying sentinel, so this may not
be worthwhile). This would need to be carefully considered.
borrowed_range
?borrowed_range
is important
because it allows the standard range adaptors and views to know that a
particular view is a proper view – that is, a reference type, not an
expensive-to-copy type like a
std::vector<int>
. Certain
views refuse to work with
non-borrowed_range
ranges,
because the result might dangle. For instance, you cannot construct a
subrange
from a
non-borrowed_range
. In other
cases, an rvalue range might be wrapped in a
owning_view
to turn it into a
view, if it was not already a
borrowed_range
. So std::vector<int>({1,2,3,4,5,6,7,8,9}) | take(3)
would take the rvalue vector
,
wrap it in an owning_view
, and
the result is a
non-borrowed_range
that is
move-only.
Here is the part of 25.4.2
[range.range] that
describes borrowed_range
:
template<class T> concept borrowed_range = <T> && (is_lvalue_reference_v<T> || enable_borrowed_range<remove_cvref_t<T>>); range
4 Let
U
beremove_reference_t<T>
ifT
is an rvalue reference type, andT
otherwise. Given a variableu
of typeU
,T
modelsborrowed_range
only if the validity of iterators obtained fromu
is not tied to the lifetime of that variable.5 [Note 2: Since the validity of iterators is not tied to the lifetime of a variable whose type models borrowed_range, a function with a parameter of such a type can return iterators obtained from it without danger of dangling. ? end note]
To increase the interoperability of the templates in
std::ranges
and to avoid
unnecessary wrapping, we should make as many views
borrowed_range
s as is
reasonable. It is probably unreasonable to do so if there is a
significant cost involved. To make a view a
borrowed_range
means copying its
state out of the view itself, and into the iterator instead.
Note that copying state to the iterator is not sufficient to
guarantee borrowed-ness, though is it necessary. Most views take an
existing view V
and do something
with it, making a new view over the existing one. For instance,
transform_view<V, F>
creates a view by transforming the elements of
V
using the invocable
F
.
A particular view specialization is only borrowed if it satisfies the
borrowed_range
concept,
and V
does as well.
This is important, because if a view is not borrowed, no view that uses
it will be either. In
rng | V1 | V2 | ... | Vn
, if
any Vi
is not borrowed,
the entire resulting view is not.
Currently, there are several views that are never
borrowed_range
s that potentially
could be. They are:
transform_view
,
zip_transform_view
,
adjacent_transform_view
,
filter_view
,
take_while_view
,
chunk_by_view
,
join_view
,
join_with_view
,
split_view
,
lazy_split_view
, and
cartesian_product_view
.
A previous version of this paper attempted to address the
borrowability of several of those views. The consensus in SG-9 was that,
since this would be a n ABI break for all of these views, that approach
was too ambitious. This version of the paper radically scales that
effort back; the plan now is to make a minimal change to
transform_view
. If that change
can gain consensus in SG-9, we can proceed with more views, if
possible.
All these changes are in
std::ranges
.
template<class F>
constexpr bool tidy-func =
<F> && is_trivially_default_constructible_v<F> && is_trivially_destructible_v<F>; is_empty_v
tidy-func
evaluates
to true
when an invocable is a
candidate for being re-created on the fly in a view’s iterators.
transform_view
Change
transform_view::iterator
to
create a new object of the type of
transform_view
’s
F
template parameter. That is,
the iterator will not reach back through the
parent_
data member to get the
fun_
member out of the view.
Also, add this specialization:
template<class T, class F>
constexpr bool enable_borrowed_range<transform_view<T>> =
<T> && tidy-func<F>; enable_borrowed_range
This simple change means that, for any
transform_view
tv
for which
enable_borrowed_range
is
true
, it is safe to use
tv.begin()
/tv.end()
after the lifetime of tv
. Even
though tv.begin()
contains a
dangling pointer to tv
, it will
never dereference that pointer.
This still poses an ABI problem, though it is much less likely to
affect real code than the change in the R0 revision of this paper. The
layout and even the values of data members is unchanged in
transform_view
,
transform_view::iterator
, and
transform_view::sentinel
. So,
passing transform_view
s and even
their iterator/sentinel pairs across an ABI boundary should be safe in
almost all cases – whichever side of the ABI boundary is using the
objects should use them in the right way. There is no ODR issue with
this change.
However, it is possible to construct a memory-unsafe usage across an
ABI boundary. New code, seeing that is_borrowed_range<transform_view<...>>
is true
, might get rid of the
view, because it can. Then old code might try to use the dangling
parent_
pointer:
// old_tu.cpp
// Code built with C++20 semantics. Separately compiled, but with the types
// and function declarations available in a header.
auto transform_foos(std::vector<int> const & vec)
-> decltype(vec | std::views::transform(some_f));
using transformed_foos_t =
decltype(transform_foos(std::declval<std::vector<int> const &>()));
void print_transformed_range(transformed_foos_t::iterator f, transformed_foos_t::sentinel l) {
for (auto it = f; it != l; ++it) {
::cout << *it << "\n";
std}
}
// new_tu.cpp
// Code built with post-P3117 semantics.
void new_using_func() {
auto r = std::ranges::subrange(transform_foos(get_vec())); // leaves parent_ to dangle
// Uses F{} to get the predicate in the iterator. OK.
for (auto i: r) {
::cout << i << "\n";
std}
// Runs in old_tu.cpp. Tries to use parent_->pred_, but parent_ is
// dangling. UB.
(r.begin(), r.end());
print_transformed_range}
There are a number of unlikely conditions that must exist for the user to write this unsafe code.
It is difficult to name the types of
std
views and their iterators.
As you can see above, we had to make a separate function that creates
the transform_view
to make it
easier to get a typename for it.
The user must pass an iterator/sentinel pair to the old TU (as
above), or possibly put the iterator/sentinel pair into a
subrange
. Passing the entire
transform_view
to a function
means that it will outlive its iterators. Passing a
subrange
is possible, but
unlikely – it you wanted to pass the iterator/sentinel pair as a range,
you would probably just pass the
transform_view
.
The potential danger of this case is limited to the is_borrowed_range<Expr> == true
case, which is affected by the borrowability of all ranges and views in
the view expression Expr
. In
other words, the proposed changes to
transform_view
will make it
borrowable, but the danger above is only present if
transform_view
is actually used
in a fully-borrowed view expression.
This kind of failure is the bad kind – it is memory unsafe, and leaves no lexical trace of its unsafety. However, it is also exceedingly unlikely that the user will find themself in this situation.
One of the authors implemented the changes suggested above. The
implementation was done by taking the libstdc++ implementations, copying
them into a new header, in a new new namespace under
std::ranges
, and altering them.
The implementation can be found here.
The header is accompanied by a test file, and a small perf test. The
implementation was very straightforward.
TODO: No real change expected; run the perf tests to be sure.