Databases layout tables carefully on the disk because disk I/O is insanely slow relative to memory. Scientific programmers organize data structures (DTs) carefully to maximize cache utilization because memory is insanely slow relative to the cache. Since writing cache conscious (CC) DTs is hard, the folks in MIT’s Cilk group came up with cache oblivious DTs (CO). This post explains the idea, albeit verbosely. The basic idea is to pack data into arrays so it will all load into the cache. These guys show that CC still beats CO by a mile, but it’s soooo difficult to implement. Some dudes at IBM show that a garbage collector can improve performance on some benchmarks simply by improving memory locality. It seems to me that if the GC documentation told me that it will usually group objects in a particular order, then I can write my DT code to take advantage of that locality. The GC is constructing CO DTs for free. But since I don’t know how the GC groups things, I could inadvertently be jumping all over main memory and suffering zillions of cache misses. That ain’t cool.