Document #: | P3490R0 |
Date: | 2024-11-14 |
Project: | Programming Language C++ |
Audience: |
SG1, SG9 |
Reply-to: |
Alexey Kukanov <alexey.kukanov@intel.com> Ruslan Arutyunyan <ruslan.arutyunyan@intel.com> |
This paper elaborates on the question of using ranges as the output for parallel range algorithms that we proposed in [P3179R2], addressing the raised concerns. The paper neither proposes nor discusses in detail using ranges as the output for serial range algorithms.
In [P3179R2] we proposed to add overloads
taking an execution policy to the functions from [algorithms] defined in
the
std::ranges
namespace. We refer to these overloads as parallel range
algorithms. The proposed design in particular suggested that:
random_access_{iterator,range}
.The last two of these design decisions were approved at the joint SG1 and SG9 meeting in St. Louis. However, the idea of using ranges for the output sequences of parallel range algorithms did not gain sufficient support. The following major concerns were expressed during the meeting as well as in personal communication:
std::ranges::copy(policy, vec1, vec2)
.Consequently, in the [P3179R3] revision we switched back to a single iterator for representing an output sequence. However, with this paper we would like to provide new information and discuss the raised concerns one more time, seeking an opportunity to go forward with the original idea, which we believe will be a notable improvement.
As a recap, we would like to propose parallel algorithms taking a range as the output for the overloads that take ranges for input. Similarly, we propose adding a sentinel for the output where the input is passed as “iterator and sentinel”. See the Proposed API section for the examples.
We also propose output ranges to have boundaries set independently of the input ranges. An algorithm should stop as soon as the end is reached for the shortest range. The main motivation is to follow established practices of secure coding, which recommend or even require to always specify the size of the output in order to prevent out-of-range data modifications. We think this will not impose any practical limitation on which ranges can be used for the output of a parallel algorithm, as we could not find or invent an example of a random-access writeable range which would also be unbounded.
This range-as-the-output approach should not be confused
with the output_range
concept
already used with a few serial range algorithms.
output_range
would not be enough for
parallel range algorithms, which require random access ranges for the
output parameter, same as for the input one.
The benefits of this approach, comparing to taking a single iterator for the output, are:
sized_range
and
sized_sentinel_for
concepts will be
applied to the output sequences in the same way as [P3179R3] applies those to the input
sequences.copy_if
(and similar
algorithms with filtering semantics), where the output sequence
might be shorter than the input one. Knowing the expected size of the
output may open opportunities for more efficient implementations.It is worth noting that to various degrees these reasons are also applicable to serial range algorithms.
We think that in practice parallel algorithms mainly write the output
data into a container or storage with preallocated space, for efficiency
reasons. So, typically parallel algorithms receive std::begin(v)
or v.begin()
or v.data()
for output, where v
is an instance
of
std::vector
or
std::array
.
Allowing v
to be passed directly for
output in the same way as for input results in a slightly simpler code.
Here is an example compared to the approach in [P3179R3]:
P3179R3
|
This paper
|
---|---|
|
|
In addition, classes such as std::back_insert_iterator
or std::ostream_iterator
,
which have no meaningful range interpretation, already cannot be used
with C++17 parallel algorithms that require at least forward iterators.
Code using these types will, in any case, require modifications to
migrate to parallel algorithms.
All in all, we think for parallel algorithms taking ranges and sentinels for output makes more sense than only taking an iterator.
Of the 89 parallel range algorithms proposed in [P3179R3] (as counted by names, not
overloads), the changes we discuss here will affect only the following
17 algorithms: copy
,
copy_if
,
move
,
transform
,
replace_copy
,
replace_copy_if
,
remove_copy
,
remove_copy_if
,
unique_copy
,
reverse_copy
,
rotate_copy
,
partition_copy
,
merge
,
set_union
,
set_intersection
,
set_difference
,
set_symmetric_difference
.
The first concern we have heard about this approach is the mismatch between serial and parallel variations. That is, if serial range algorithms only take iterators for output and parallel range algorithms only take ranges, switching between those will always require code changes. That can be resolved by:
or both.
The option A would give some of the described benefits to serial range algorithms as well; one could argue that it would be a useful addition on its own. However it is obviously too late to pursue this option for C++26.
The option B does not seem to have benefits besides the aligned semantics, while it has the downside of some algorithm variations not being strengthened with the requirement for the output sequence to be bounded. Nevertheless, given that the option A is not considered for C++26, we prefer the option B, i.e. supporting both ranges and iterators as the output of parallel range algorithms, to the status quo of [P3179R3] that supports only iterators.
For the “iterator and sentinel” overloads we prefer to always require a sentinel for output, despite the mismatch with the corresponding serial overloads. We expect those overloads to be rarely used in general, and in many cases the C++17 parallel algorithms can be used instead. Adding a sentinel for output, on the other hand, preserves the existing approach of expressing a range as the “iterator and sentinel” pair in algorithm descriptions, as well as allows to specify how the end of the output sequence is computed for the option B.
The potential semantic ambiguity can be illustrated by the following code:
::vector<int> vec1;
std::vector<int> vec2;
std
// Some initialization
::ranges::copy(policy, vec1, vec2); std
for which there might be uncertainty what the behavior is when vec2.size() != vec1.size()
.
Some might expect that std::ranges::copy
resizes vec2
and makes it an exact
copy of vec1
.
However, the standard already has a precedent in serial variations of
std::ranges::uninitialized_copy
and std::ranges::uninitialized_move
,
which have the range-as-the-output semantics exactly as we propose:
Giving another semantics to std::ranges::copy
etc. would be inconsistent with these already existing algorithms.
Moreover, it would be generally inconsistent with the current
element-wise semantics of both iterator-based and range
algorithms. For example, when vec2.size()
is greater than vec1.size()
,
std::ranges::copy(vec1, vec2.begin())
does not modify the elements of vec2
beyond the size of vec1
, and so
should not std::ranges::copy(policy, vec1, vec2)
.
Alternatively, potentially ambiguous names can be modified with some
prefix, such as partial_
. It would
follow std::ranges::partial_sort_copy
,
another range-as-the-output algorithm that already uses the semantics we
propose. That is, we could keep copy
with iterator-as-the-output and introduce
partial_copy
with
range-as-the-output semantics, and similarly for other algorithms. We do
not recommend this however, as there is no serial algorithms with such
names, and also because it would create very sophisticated names:
partial_remove_copy_if
, really?
Speaking of precedents, it is worth noting that there are more
existing range-as-the-output algorithms -
fill
,
generate
, and
iota
. Their specifics is absence of
input sequences, so the output sequence needs a boundary. However,
extending this principle from algorithms with zero input sequences to
those with one or more seems appropriate.
Finally, let’s discuss if the proposal could interfere somehow with the anticipated improvements in how the serial range algorithms operate with output data.
First and foremost, the range-as-the-output approach will for now only apply to the new function overloads which first parameter is an execution policy. No modifications to the existing range algorithms are proposed, so there is no direct impact that would limit possible design decisions in the future.
But even though the functions for serial and parallel execution have independent constraints, there is more to consider. If/when the support for ranges as the output of serial algorithms is added later, will it be “backward compatible” with what this paper proposes? In other words, will the transition from a parallel algorithm back to the serial counterpart be as simple as removal of the execution policy?
void normalize_serial(range auto&& v) {
auto mx = max(v);
(v, views::repeat(mx), v, divides);
transform}
The answer, we believe, is “Yes”. The reasoning is that parallel range algorithms impose stricter requirements on their parameters, narrowing down a possible set of arguments. If a range is accepted by a parallel range algorithm, it is certainly accepted by a serial range algorithm, whether for input or for output. Putting it differently, it will be very strange if serial algorithms somehow exclude random access sized ranges, a subset of the much broader set of writeable ranges that could be used as the output.
Another question is whether other anticipated improvements on the output of serial range algorithms might be applicable for the parallel ones, and so should be considered in our design. We have found two WG21 papers that speak about modifications of the output for range algorithms. [P2550R0] proposes that, citing,
- All output algorithms now require
weak_output_iterator
- In most cases, that’s re-specifying
weakly_incrementable
andindirectly_writable
(no requirements change, just better name)- In some cases, that’s weakening the requirement on those algorithms that require
output_iterator
(all currently valid code is still valid)
If that paper is accepted, similar requirement modifications might be made for parallel range algorithms, both taking iterators and ranges as the output, and likewise it will not introduce breaking changes for what we propose in [P3179R3] and in this paper.
[P2760R1] discusses shortcomings of
output-only iterators, such as std::back_insert_iterator<C>
.
While there is no formal proposal yet, the design outlined there uses
two new customization points, std::ranges::put
and std::ranges::put_range
,
which would make implementations of classes like std::back_inserter
both simpler and more efficient. Types that customize these CPOs would
not represent a range, not even an iterator; however, the CPOs
themselves would work with output iterators and output ranges. We think
that:
std::back_inserter
,
are already not applicable and filtered out.To summarize, we do not see how our proposal could create any issues for future improvements related to the output of serial range algorithms, and vice versa.
Since we propose to have both an overload with iterator-as-the-output and an overload with range-as-the-output for each relevant algorithm, it appears that overloads with range-as-the-output could be added later. This is true, though there is a caveat.
If we don’t add the range-as-the-output overload in C++26 that likely means that we don’t add the output sentinel to the “iterator and sentinel” counterpart as well. Given the C++ committee’s concerns about the growing number of algorithm overloads, we are afraid that adding another one with the output sentinel later will be opposed.
That will have the following consequences:
Please look at the “iterator and sentinel” overload in the Proposed API to see what API we likely lose in the standard if the proposal is not accepted.
Below is an example of modifications to be made to the wording
proposed in [P3179R3]. The example adjusts the binary
transform
algorithm to use an output
sentinel and adds a new overload taking an output range. Underlines are
used to highlight the key differences introduced by this paper. Similar
modifications will be made to all the mentioned 17 algorithms if the
paper is supported.
Alternatively, we can use an exposition-only
range-or-iterator
concept
that combines the requirements for both a range and an iterator by
logical disjunction, as its name suggests. We did not explore which way
makes more sense; at glance, there seems to be little practical
difference for library implementers.
template<typename ExecutionPolicy,
<I1> S1,
random_access_iterator I1, sentinel_for<I2> S2,
random_access_iterator I2, sentinel_forsized_sentinel_for<O> SO,
random_access_iterator O,
copy_constructible F,class Proj1 = identity, class Proj2 = identity>
requires indirectly_writable<O,
<F&, projected<I1, Proj1>, projected<I2, Proj2>>>
indirect_result_t&& (sized_sentinel_for<S1, I1> || sized_sentinel_for<S2, I2>)
constexpr binary_transform_result<I1, I2, O>
(ExecutionPolicy&& policy, I1 first1, S1 last1, I2 first2, S2 last2, O result,
transformSO result_last,
F binary_op, Proj1 proj1 = {}, Proj2 proj2 = {});
template<typename ExecutionPolicy,
::random_access_range R1,
ranges::random_access_range R2,
rangesrandom_access_iterator O
,
copy_constructible F,class Proj1 = identity, class Proj2 = identity>
requires indirectly_writable<O,
<F&,
indirect_result_t<ranges::iterator_t<R1>, Proj1>,
projected<ranges::iterator_t<R2>, Proj2>>>
projected&& (sized_range<R1> || sized_range<R2>)
constexpr binary_transform_result<ranges::borrowed_iterator_t<R1>,
::borrowed_iterator_t<R2>,
ranges>
O(ExecutionPolicy&& policy, R1&& r1, R2&& r2, O result, F binary_op,
transform= {}, Proj2 proj2 = {}); Proj1 proj1
template<typename ExecutionPolicy,
ranges::random_access_range R1,
ranges::random_access_range R2,ranges::random_access_range OutR
,
copy_constructible F,
class Proj1 = identity, class Proj2 = identity>ranges::iterator_t<OutR>
,
requires indirectly_writable<
indirect_result_t<F&,
projected<ranges::iterator_t<R1>, Proj1>,
projected<ranges::iterator_t<R2>, Proj2>>>&& sized_range<OutR>
&& (sized_range<R1> || sized_range<R2>)
constexpr binary_transform_result<ranges::borrowed_iterator_t<R1>,
ranges::borrowed_iterator_t<R2>,ranges::borrowed_iterator_t<OutR>
>
OutR&&
result, F binary_op,
transform(ExecutionPolicy&& policy, R1&& r1, R2&& r2, Proj1 proj1 = {}, Proj2 proj2 = {});
The oneAPI DPC++ Library (oneDPL) developer guide covers parallel range algorithms we’ve implemented so far. The oneAPI specification provides formal signatures of these algorithms. The implementation supports execution policies for CPUs (semantically aligned with the C++ standard) and for SYCL devices, and it works with a subset of the standard C++ views.
We use the range-as-the-output approach where applicable: in
transform
,
copy
,
copy_if
, and
merge
. We don’t have
iterator-as-the-output overloads for these algorithms, as the motivation for it applies much
less to oneDPL. However, we don’t foresee any issues with adding such
overloads later, if necessary.
Poll: Continue work on P3179R2 for IS’26 with these notes:
SF
|
F
|
N
|
A
|
SA
|
---|---|---|---|---|
7 | 4 | 2 | 1 | 0 |