The great thing about the Internet is that CS luminaries might wander by your little blog and tell you you’re wrong. I thought there might be a way to exploit laziness to make more efficient use of the memory hierarchy on conventional machines. I was thinking about how the FFTW compiler gets great performance. At a high-level, a planner figures out how best to execute many “codelets” (small chunks of optimized code) for an instance of a problem (FFTs in this case) to achieve maximum performance. (SPIRAL does something similar) One insight is to use a cache-oblivious memory model rather than a conventional RAM model. The planner breaks a large problem down into pieces that fit into a cache, then apply the appropriate codelets to each chunk to yield a solution.
But what about laziness? In a lazy language, you get a graph of possible computations and the compiler determines which node to evaluate next. In conventional lazy languages, they evaluate only those nodes that are, or will soon be, needed, i.e. lazily. Eager Haskell, on the other hand, is designed to evaluate most things immediately, and only suspend computation when they run out of resources (heap or stack space). We can change the resource constraints to be cache space instead. That is, only compute a subset of an array that fits in a single page of cache, then maybe suspend computation. What Haskell gives you is a clean operational semantics with much more flexibility in scheduling computations.
Combine both ideas and you might get something useful. Write your program in a form of Eager Haskell (no infinite data structures, though) that uses optimized low-level functions (i.e. codelets). Take the lazy graph and use FFTW’s techniques to determine a good execution order. Compile and enjoy. If you look at what these guys did to get good performance from the Cell processor [1, 2], you’d prefer some help from the compiler. A simple 60 line C program became a 1200 line optimized mess for the Cell. That’s not fun.