Why Undefined Semantics for C++ Data Races?

Our proposed C++ memory model gives completely undefined semantics to programs with data races. Many people have questioned whether this is necessary, or whether we can provide some guarantees in even this case. Here we give the arguments for completely undefined semantics.

Some arguments are already given in the introductory "Rationale" section of committee paper N1942. This is a more detailed exploration.

The basic arguments for undefined data race semantics in C++ are:

As an example of the last phenomenon, consider a relatively simple example, which does not include any synchronization code. Assume x is a shared global and everything else is local:
unsigned i = x;

if (i < 2) {
    foo: ...
    switch (i) {
    	case 0:
		...;
		break;
    	case 1:
		...;
		break;
	default:
		...;
    }
}
Assume the code at label foo is fairly complex, and forces i to be spilled and that the switch implementation uses a branch table. (If you don't believe the latter is a reasonable assumption here, assume that "2" is replaced by a larger number, and there are more explicit cases in the switch statement.)

The compiler now performs the following reasonable (and common?) optimizations:

Now consider the case in which, unbeknownst to the compiler, there is actually a race on x, and its value changes to 5 during the execution of the code labelled by foo. The results are:
  1. When i is reloaded for the evaluation of the switch expression, it gets the value 5 instead of its original value.
  2. The branch table is accessed with an out-of-bounds index of 5, resulting in a garbage branch target.
  3. We take a wild branch, say to code that happens to implement rogue-o-matic (the canonical form of C++ "undefined behavior").
Although it appears difficult to get existing compilers to exhibit this kind of behavior for a variety of reasons, I think compiler writers will generally refuse to promise that it cannot happen. It can be the result of very standard optimization techniques. As far as I know, precluding this behavior with certainty would require significant optimizer redesign. I do not see how to avoid this kind of behavior without fundamentally Even if we were to prohibit reloading of a local from a global shared variable, we would still have to explain similar behavior for the analogous program which tests x and then uses x as the switch expression. That becomes slightly easier to explain, but it still seems very confusing.

Some such redesign is necessary anyway to avoid speculative writes. But I think that is usually limited to a few transformations which can currently introduce such writes. I suspect the redesign required here would be significantly more pervasive. And I do not see how to justify the cost.

The one benefit of providing better defined race semantics seems to be somewhat easier debuggability of production code that encounters a race. But I'm not sure that outweighs the negative impact on race detection tools.