ISO/IEC JTC 1/SC 22/OWGV N 0155
Proposed Vulnerability Description on Concurrency
Contributed by Steve Michell
30 September 2008
6.Y Concurrency [CGW]
6.Y.1 Description of Application Vulnerability
Concurrency, or parallel programming, includes
- Multiple programs executing on single cpu or on multiple
cpus,
- Multiple tasks (or threads) executing in the same process
on single cpu or on multiple cpus,
- Multiple tasks (or threads) executing in the multiple processes
on single cpu or on multiple cpus, and
- A single treaded task that handles interrupts or events asynchronously
Concurrency presents a significant challenge to program correctly,
and has a large number of possible ways for failures to occur,
quite a few known attack vectors, and many possible but undiscovered
attack vectors.
Concurrency is a major challenge because the analysis of concurrent
programs is so difficult. In the general case there are exceedingly
large numbers (numbers in the ranges of 10**100 for even small
programs) of possible interleavings of parallel portions. Static
analysis of the general cases is impossible because the complexities
are so very high and because such analysis requires modelling
the complete program at once - an NP Hard problem. Dynamic analysis
of the general cases is fruitless because a concurrent program
will never behave in exactly the same way twice, even with identical
inputs.
The previous discussion should not lead one to dispair, because
there are ways of taming concurrency, and there are various paradigms
that are known to be analyzable.
- Where there is no dependency in storage, code base, or time
between concurrent portions, and that they all terminate before
the final result is collected.
- CSP-compliant [ref hoare] intertask communication and scheduling
with complete storage independence on one shot calculations.
Such programs have been shown to be formally provable, as long
as it can be shown (right down to the hardware level) that the
locking mechanisms to implement the message channels is correct.
- CSP-compliant intertask communication and scheduling with
complete storage independence on some limited non-terminating
programs (where formally provable using model checking), with
the same caveats as above. Note, however that timing may not
be verifiable because CSP tasks block on messages, sending or
receiving, and not on time.
- Some non terminating programs such as Rate Monotonic systems
where all task scheduling has a common base (usually wakeup time
as a function of urgency), intertask communication is nonblocking
using monitors or protected objects and consistent with the ceiling
priority protocol, unscheduled events are forbidden, and the
lock management protocols for intertask communication has been
shown to be correct.
Research is ongoing into more paradigms, such as variations
of non-terminating programs on single cpus using the Ravenscar
Tasking Profile.
The fact is, however, that we must work with concurrency in
more general settings, and most do not satisfy the constraints
listed above. Most concurrent programs are cyclic or non terminating,
handle events from an OS or interrupts from hardware, work with
multiple different levels of urgency (often called priority) between
components, have concurrency behaviours that are coupled to data
being processed, and have interesting data couplings between tasks.
Most real time systems are effectively concurrent systems,
but must account for time, mixed urgencies and the consequences
of failure to complete necessary computations.
It is not known if attacks based on knowledge of concurrency
of a system can be more sophisticated than denials of service
based on deadlock or livelock.
For safety-related systems, the sheer number of possible failure
modes and unacceptable interactions create almost intolerable
complexity. Many such systems handle these by using exclusively
time-based cyclic systems, eliminating interrupts, and extensive
of dedicated cpus. Some usage of very restrictive concurrency
paradigms such as Ravenscar and similar microkernels has been
seen.
6.Y.2 Cross References
6.Y.3 Mechanism of Failure
The basic classifications of concurrency related failures are:
- Data corruption from one task writing shared data that is
being being accessed by another task.
- Data corruption from multiple tasks accessing system services
simultaneously
- Deadlock, where some or all tasks become blocked waiting
for resource(s) that cannot be released because the task responsible
for releasing is also terminally blocked
- Priority inversions, where a high urgency task is waiting
for execution resources while a lower urgency task is executing,
to the point that calculations are wrong, late or missed altogether.
- Livelock, where one or more tasks consume all available computational
resources doing meaningless tasks (such as waiting for a lock
that cannot be released) so that tasks capable of doing meaningful
work cannot execute
- Missing time constraints, because of processing overrun,
resulting in late and therefore wrong errors
6.Y.4 Applicable Language Characteristics
- Languages that permit concurrency within the language, or
support libraries (such as POSIX) that provide hooks for concurrency
control.
- Languages that interface with OS's that provide event-driven
behaviour to individual subprograms, such that multiple subprograms
could be executing concurrently in the same program, or even
in different programs where shared data is being manipulated.
6.Y.5 Avoiding the Vulnerability or Mitigating its Effects
- Use the simplest and highest level of concurrency abstraction
that it is possible to get as a startng point.
- Avoid low level primitives such as Semaphores, except to
build higher level abstractions, such as Monitors, Protected
Objects, or CSP message channels.
- Protect all data inside each task or inside shared objects
protected from concurrent access (such as a monitor or a message
channel)
- Ensure that all protected objects account for priority and
provide safety of access even when tasks of mixed priorities
access that data.
- Ensure that all possible cases of priority inversion are
identified and analysed to develop guarantees on boundaries for
priority inversion.
- Expose the proposed system to extensive human review.
- Where the number of concurrent items is low and the concurrency-related
states can be extracted from the larger program and is also relatively
small, use model checkers [SPIN, UPPAAL,] to verify safety, liveness
and timing propertes. Model checkers such as SPIN cannot handle
time, but timed automata such as UPPAAL can handle reasoning
about time.
6.Y.6 Implications for Standardization
- Provide concurrency abstractions that support static analysis,
such as one of the models that are known to have safe properties.
6.Y. 7 Bibliography
- Hoare A., "Communicating Sequential Processes",
Prentice Hall, 1985
- Holzmann G., "The SPIN Model Checker: Principles and
Reference Manual"., Addison Wesley Professional. 2003
- UPPAAL, available from www.uppaal.com,
- Larsen, Peterson, Wang, "Model Checking for Real-Time
Syetems"., Proceedings of the 10th International Conference
on Fundamentals of Computation Theory, 1995
- Ravenscar, specified in ISO/IEC 8652:1995 Ada with TC 1:2001
and AM 1:2007