Supplements to C++ Latches

This paper adds two new independent concepts to "C++ Latches and Barriers" (N4204, previously N3998). Readers are assumed to be familiar with the terminology and concepts from that paper.

Problem

A latch cannot safely be destroyed until all co-ordinating threads have arrived at the synchronization point. This effectively means that a thread must wait on the latch before destroying it. Typical code may look like this.

void DoWork() {
latch completion_latch(NTASKS);
for (int i = 0; i < NTASKS; ++i) {
// add task...
}
// Must block this thread until work is done
completion_latch.wait();
}

Latches would be often be easier to use if they were automatically destroyed when the last waiting thread left the latch.

In addition, it is impossible, using a single latch, to invoke a function after all threads have arrived at the synchronization point, but before any threads are released. It is possible to write:

latch start_latch(NTASKS);
for (int i = 0; i < NTASKS; ++i) {
pool->add_task([&] {
PrepareWorker();
start_latch.arive_and_wait();
DoWork();
}));

}
start_latch.wait();
cout << "workers about to start\n"

}

But by the time the message is printed, the workers may already have started. To ensure that the message is printed before the workers start, a second latch would be required.

Solution

To avoid the thread that creates the latch having to block on wait(), we propose a modification to the latch class to allow self-deleting latches to be created.

To allow a completion function to run at the appropriate time, we propose a flex latch, similar to the flex barrier.

These proposals can be added independently. Note, however, that there may be benefits to combining them. See the section "Using Self-Deleting Flex Latches" below.

Self-Deleting Latches

A self-deleting latch destroys itself after the synchronization point has been reached. It is not necessary for a thread to wait on the latch before destroying it. Sample usage:

void DoWork(threadpool* pool) {
latch* completion_latch = latch::create_self_deleting(NTASKS);
for (int i = 0; i < NTASKS; ++i) {
pool->add_task([&] {
// perform work (part 1)
...

completion_latch->arrive_and_wait();

// perform work (part 2), knowing that

// all threads have performed part 1.
...

}));

}

// May return here - completion_latch will be destroyed
// when part 1 has finished

}

After all threads have exited the functions that caused the synchronization point to be reached, a self-deleting latch is guaranteed to have been destroyed, and its pointer should not be used.

Otherwise, a self-deleting latch behaves as a normally constructed latch, with the exception that the wait()/try_wait() functions may not be invoked on a self-deleting latch. (Because the latch will destroy itself when the last thread reaches the completion, it would only be possible to call wait() by synchronizing with the worker threads using the latch. If this is necessary, the latch can be created with a count N+1, and the co-ordinating thread can use arrive_and_wait(). In practice, a self-deleting latch is only useful when no thread is going to call wait(), so we do not feel that this restriction is significant.)

Note: Although it would be possible to implement the same functionality using std::shared_ptr<std::latch>, we believe that the self-deleting latch is cleaner, and has less overhead.

Header std::latch Synopsis

The synopsis is as in N4204, with the following amendments:

static latch* create_self_deleting( int C );

Requires: C shall be >= zero.

Effects: creates a new latch, initialized with a count of C, and returns a pointer to the new latch. [Note: If C is zero, the synchronization condition has been reached, and the returned pointer may not be used. End note]. The

Synchronization: None

~latch( );

Requires: This latch was not created with a call to create_self_deleting(). [Additional wording as per N4204.]

void arrive( );

Effects: Decrements the internal count by 1. If the count reaches 0 the synchronization condition is reached. The latch may have destroyed itself when this method returns if it was created with a call to create_self_deleting(). [Additional wording as per N4204. Similar wording for count_down() and arrive_and_wait().]

void wait( );

Requires: This latch was not created with a call to create_self_deleting().

Effects: [Additional wording as per N4204. Similar wording for try_wait().]

Flex Latch

A variant of latch that takes a completion function.

Header std::flex_latch Synopsis

Provides the Latch concept.

A flex latch behaves as a latch, but is constructed with a callable completion function that is invoked after all threads have arrived at the synchronization point, and before the synchronization condition is reached.

The flex barrier maintains an internal counter that is initialized when the latch is created. The synchronization condition is reached when the counter is decremented to 0. Threads may block at a synchronization point waiting for the condition to be reached. When the condition is reached, any such blocked threads will be released.

template <typename T>

flex_latch( int C, T F );

Requires: C shall be >= zero. F shall conform to the callable void() concept.

Effects: initializes the latch with a count of C, and a callable object that will be invoked after the synchronization condition is reached. [Note: If C is zero, the synchronization condition has been reached. End note]

Synchronization: None

~flex_latch( );

Requires: No threads are blocked at the synchronization point. May be called if threads have not yet returned from wait() or arrive_and_wait() provided that the condition has been reached. [Note: The destructor may not return until all threads have exited wait() or arrive_and_wait(). End note]

void arrive( );

Effects: Decrements the internal count by 1. If the count reaches 0 the synchronization condition is reached. Before any threads are released, the callable completion object registered in the constructor will be invoked. (The completion will be invoked in the context of one of the threads that invoked arrive() or arrive_and_wait().) If called more than once by a given thread the behaviour is undefined.

Throws: std::logic_error if the internal count would be decremented below 0.

Synchronization: Synchronizes with invocations of the completion function. Invocations of the completion function then synchronize with calls unblocked as a result of this call.

void arrive_and_wait( );

Effects: Decrements the internal count by 1. If the count reaches 0 the synchronization condition is reached. Before any threads are released, the callable completion object registered in the constructor will be invoked. (The completion will be invoked in the context of one of the threads that invoked arrive() or arrive_and_wait().) Otherwise blocks at the synchronization point until the synchronization condition is reached. If called more than once by a given thread the behaviour is undefined.

Throws: std::logic_error if the internal count would be decremented below 0.

Synchronization: Synchronizes with calls unblocked as a result of this call and try_wait calls on the same latch that return true as a result.

void count_down( int N );

Effects: Decrements the internal count by N. If the count reaches 0, the synchronization condition is reached. Before any threads are released, the callable completion object registered in the constructor will be invoked. (The completion will be invoked in the context of one of the threads that invoked arrive() or arrive_and_wait().) May be called by any thread. Does not block.

Throws: std::logic_error if the internal count would be decremented below 0.

Synchronization: Synchronizes with calls unblocked as a result of this call and try_wait calls on the same latch that return true as a result.

void wait( );

Effects: Blocks the calling thread at the synchronization point until the synchronization condition is reached. If the condition has already been reached, the thread does not block.

bool try_wait( );

Returns: Returns true if the synchronization condition has been is reached, and false otherwise. Does not block.

flex_latch(const flex_latch&) = delete;
flex_latch& operator=(const flex_latch&) = delete;

Memory Ordering

Calls to arrive, arrive_and_wait() or countdown() that triggered the completion synchronize with invocation of the completion function. The completion function synchronizes with all calls unblocked after it has run.

Using Self-Deleting Flex Latches

Consider the use-case where several threads perform a parallel operation. When all operations are complete, we wish to perform one final operation. With the current N4204 proposal, we must either have one additional thread that blocks on wait(), or we must designate one of the worker threads to perform the final operation. (This may be difficult if the threads are part of a threadpool, and we do not know which thread will run which task.)

A self-deleting Flex Latch can be given a completion function that will be invoked when all threads have completed their task and arrived at the synchronization condition. When the completion finishes, the worker threads will continue, and the latch will be destroyed.

void DoWork(threadpool* pool) {

flex_latch* completion_latch = flex_latch::create_self_deleting(
&OnCompletion, NTASKS);
for (int i = 0; i < NTASKS; ++i) {
pool->add_task([&] {
// perform work
…
completion_latch->count_down();

}));

}

// Return without blocking. OnCompletion will run when
// the threads have finished, and the latch will then be
// destroyed.

}