Why Undefined Semantics for C++ Data Races?
Our proposed C++ memory model gives completely undefined semantics
to programs with data races. Many people have questioned whether this is
necessary, or whether we can provide some guarantees in even this case.
Here we give the arguments for completely undefined semantics.
Some arguments are already given in the introductory "Rationale" section
of committee paper
N1942. This is a more detailed exploration.
The basic arguments for undefined data race semantics in C++ are:
- Unlike Java or .NET we can get away with it. We do not have to
worry about untrusted sand-boxed code exploiting such "undefined behavior" as
a security hole. We also do not currently have to worry about preserving
type-safety in the presence of data races, since we don't have it anyway.
Other inherent "holes" in the type system, such as C-style arrays,
are far more likely to result in type-safety violations in practice.
- I believe it is usually the status quo. Pthreads
states "Applications shall ensure that
access to any memory location by more than one thread of control (threads or
processes) is restricted such that no thread of control can read or modify a
memory location while another thread of control may be modifying it." I've
seen no evidence that the original intent of win32 threads was any different,
though recent Microsoft tools may enforce stronger implementation properties.
- I am unconvinced that we are doing the programmer a favor by
providing stronger semantics. Data races among ordinary variables
are a bug. If the programmer intended for there to be a data race,
we will provide atomic variables, which convey that
intent to both the compiler and any human readers of the code. They
also allow the programmer to be explicit about ordering assumptions, which
typically require careful consideration to ensure correctness.
- By leaving data race semantics undefined, we allow implementations to
generate error diagnostics if a race is encountered. I believe this
would be highly desirable. Constructing such tools that can operate beyond
a debug environment on current hardware is clearly challenging, but
may be feasible in some cases, and hardware support seems at least conceivable
here.
- Perhaps more importantly, by requiring programmers to avoid
unannotated races, we reduce false positives in more practical debug-time
race detection tools.
- It appears likely that we have to leave the semantics of data races
undefined in at least some cases. Consider the case in which
I declare a function scope object P
with a vtable pointer, store a pointer to P in a global x,
another thread reads x, and then calls one of its virtual member
functions. On many architectures (e.g. PowerPC, ARM), a memory
fence would be required as part of object construction in order to
preclude a wild branch as a result of the virtual function call.
There
appears to be a consensus that we do not want to incur the fence overhead
in such cases. This means that we need to allow wild branches in
response to some kinds of data races. Allowing this in some
cases, but not allowing it in all cases, would undoubtedly complicate
the specification.
-
Current compiler optimizations often assume that objects do not
change unless there is an intervening assignment through a potential alias.
Violating such a built-in assumption can cause very complicated
effects that will be very hard to explain to a programmer, or
to delimit in the standard.
And I believe this assumption is sufficiently ingrained in current
optimizers that it would be very difficult to effectively remove it.
As an example of the last phenomenon, consider a relatively
simple example, which does not include any synchronization code.
Assume x is a shared global and everything else is local:
unsigned i = x;
if (i < 2) {
foo: ...
switch (i) {
case 0:
...;
break;
case 1:
...;
break;
default:
...;
}
}
Assume the code at label foo is fairly complex, and forces i
to be spilled and that the switch implementation uses a branch table.
(If you don't believe the latter is a reasonable assumption here, assume that
"2" is replaced by a larger number, and there are more explicit cases in
the switch statement.)
The compiler now performs the following reasonable
(and common?) optimizations:
- It notices that i and x (in its view of the world)
contain the same values. Hence when i is spilled, there is no need
to store it; the value can just be reloaded from x.
- It notices that the switch expression is either 0 or 1, and hence
eliminates the bounds check on the branch table access and the branch to the
default case.
Now consider the case in which, unbeknownst to the compiler, there is
actually a race on x, and its value changes to 5 during the execution
of the code labelled by foo. The results are:
- When i is reloaded for the evaluation of the switch
expression, it gets the value 5 instead of its original value.
- The branch table is accessed with an out-of-bounds index of 5,
resulting in a garbage branch target.
- We take a wild branch, say to code that happens to implement
rogue-o-matic (the canonical form of C++ "undefined behavior").
Although it appears difficult to get existing compilers to exhibit this
kind of behavior for a variety of reasons, I think compiler writers will
generally refuse to promise that it cannot happen. It can be the result
of very standard optimization techniques. As far as I know, precluding
this behavior with certainty would require significant optimizer redesign.
I do not see how to avoid this kind of behavior without fundamentally
Even if we were to prohibit reloading
of a local from a global shared variable, we would still have to explain
similar behavior for the analogous program which tests x and then
uses x as the switch expression. That becomes slightly
easier to explain, but it still seems very confusing.
Some such redesign is necessary anyway
to avoid speculative writes. But I think that is usually limited
to a few transformations which can currently introduce such writes.
I suspect the redesign required here would be significantly
more pervasive. And I do not see how to justify the cost.
The one benefit of providing better defined race semantics seems to
be somewhat easier debuggability of production code that encounters a
race. But I'm not sure
that outweighs the negative impact on race detection tools.