Need fine-grained concurrency primitives

Buried in this very long post about ObjectStore is a quote from Dave Moon:

Looking into the future, Dave Moon says: “The illusion of random access memory is becoming increasingly unconvincing on modern hardware. Although dereferencing a pointer takes only one instruction, when the target of the pointer is not cached in the CPU that instruction can take as long to execute as 1000 ordinary instructions executed at peak speed. [snip] But at the same time, the advantage of C++ and other conventional programming languages is being eroded in the same way. It is not unreasonable to predict that we will see widespread abandonment of the illusion of random access memory in the next two decades. The IBM Cell processor used in video games is the first crack in the dam.”

[ed: below is my nonsensical idea, not Weinreb’s nor Moon’s. I’m just thinking out loud based on the quote above. Both disagreed in the comments.]

When RAM is so terribly expensive to access, the way to squeeze out additional performance is to get lots of work done with the data that’s already in the cache. A compiler needs to find and exploit fine-grained concurrency in a program, but conventional languages often get in the way. Most popular languages are assumed to be single-threaded programs, where multi-threaded features are thrown in later as a library (threads, mutex, semaphores, etc). This gives you coarse-grained parallelism. Some libraries offer higher level support, such as the Task Parallel Library. This still uses threads to exploit multiple cores, but fails to arrange computation on a single core to maximize use of the cache. I don’t know how exactly, but I think lazy languages can help here. Right now laziness is used to avoid execution (e.g. infinite lists), but it could be used to radically reorder programs to process data aggressively while it is still in the cache.

Advertisements

4 comments

  1. Daniel Weinreb

    Actually, Moon didn’t say anything about fine-grain concurrency control. It is possible to take advantage of multi-core processors with deep caches by simply assigning disjoint main memory to each core and running threads in each core that don’t interact with the others using the sharing of memory; or, at least, share memory in limited and restricted ways, to keep the concurrency control simple.

    On the other hand, there has been a lot of theoretical work on CPU’s with built-in transactional capacity at the instruction set level, which would make concurrent programming simpler to think about and do correctly.

    I think that as the number of cores keeps going up, there will be a lot of experimentation with different approaches to programming to keep those cores busy. It should be very interesting to see how it all plays out.

  2. Moon

    Fine-grained concurrency will just advance the computation to the next cache miss faster. I fail to see how that would help! Seems like Amdahl’s law applies here.

    Aggressive reordering to avoid fetching the same data into the cache more than once makes a little more sense.

    But you are still thinking in terms of an asymptotically smart compiler somehow being able to preserve the illusion of random access memory. My point is that as that becomes increasingly untenable, we will instead be forced to change the model of computation in which the programmer thinks to a model that more closely corresponds to what the hardware can actually deliver. Then the compiler is no longer being asked to perform an impossible job. Unfortunately I do not have any great ideas to offer for how to do this. You could think about modelling memory as a set of streams of data rather than as a random-access array of data.

  3. Pinku

    Moon is right: I’m still thinking about squeezing extra speed from conventional RAM models. But conventional hardware is going to be around for quite a while, so it’s not in vain. I have no idea what next-gen hardware might do. My last rambling comment about lazy languages was motivated by my dim memory of the FFTW compiler. FWIW, I’ll explain in a full post.

  4. Pingback: Eager lazy languages « Handwaving

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s