Note that this is a typical scenario encountered by CORD_from_file_lazy in the cord package distributed with our garbage collector. Similar observations are likely to apply to any hash-table-like data structure that must support fast read accesses by multiple threads.
We will cache a fixed number of file sections. A read operation from an uncached part of the file will usually replace one of these file sections.
Our goal is to reduce the cost of a read from a cached section of the file to a few memory operations. By analogy to a direct mapped CPU cache, one way to accomplish this is to make the section lengths fixed (corresponding to cache lines), and to use the low order bits of the file position to index into the section. The next higher group of file position bits is used to index into a table T, each entry of which is a pointer to a section. Each section also contains its full starting address in the file, so that we can validate that the appropriate file section is actually cached.
In the garbage collected version, this all works fine. A cache miss is handled by allocating a new section, reading the appropriate section from the file, filling in the newly allocated section, and then atomically replacing the entry in T. This works correctly even in the presence of another concurrent access. The second access will see either the old or the new section. Either will provide it with accurate (though perhaps useless) information. Note that if the second access retrieves the old section, the old section will not be deallocated until the second access completes, since it remains accessible. Thus read operations that hit the cache require no locking.
In the explicitly managed version, a cache section being replaced must be deallocated. This requires that there must be no concurrent read operations proceeding on the old section. There appears to be no way to ensure that without requiring every read operation, including cache hits, to acquire and release a lock. With a typical thread library, this is likely to increase the time required for a cache hit by something between a factor of 2 and 10, or possibly more. It is not hard to conceive of applications for which this is the dominant cost. The text editor supplied with the cord package is probably often in this category.
(To be fair, the explicitly managed version might well allocate all sections statically, since it needs to lock on read accesses anyway. This saves some allocation and collection cost relative to the garbage collected version. But that is likely to be dominated by IO costs, and only affects the cache miss time, not the cache hit time. The cache hit time will still be much lower in the garbage collected version.)
This is a revised copy of a page written while the author was at Xerox PARC. The original is here.