Looking into the future, Dave Moon says: “The illusion of random access memory is becoming increasingly unconvincing on modern hardware. Although dereferencing a pointer takes only one instruction, when the target of the pointer is not cached in the CPU that instruction can take as long to execute as 1000 ordinary instructions executed at peak speed. [snip] But at the same time, the advantage of C++ and other conventional programming languages is being eroded in the same way. It is not unreasonable to predict that we will see widespread abandonment of the illusion of random access memory in the next two decades. The IBM Cell processor used in video games is the first crack in the dam.”
[ed: below is my nonsensical idea, not Weinreb’s nor Moon’s. I’m just thinking out loud based on the quote above. Both disagreed in the comments.]
When RAM is so terribly expensive to access, the way to squeeze out additional performance is to get lots of work done with the data that’s already in the cache. A compiler needs to find and exploit fine-grained concurrency in a program, but conventional languages often get in the way. Most popular languages are assumed to be single-threaded programs, where multi-threaded features are thrown in later as a library (threads, mutex, semaphores, etc). This gives you coarse-grained parallelism. Some libraries offer higher level support, such as the Task Parallel Library. This still uses threads to exploit multiple cores, but fails to arrange computation on a single core to maximize use of the cache. I don’t know how exactly, but I think lazy languages can help here. Right now laziness is used to avoid execution (e.g. infinite lists), but it could be used to radically reorder programs to process data aggressively while it is still in the cache.