Archive for August, 2009

Processing language for memcached

August 30, 2009

memcached is a hashtable with a simple API that allows for remote access. It is a major component of many large-scale web sites because it caches results from more expensive remote resources, i.e. DB access. The API consists of a several different storage methods (set, add, replace, append, prepend, cas), retrieval methods (get, gets), and a delete command (delete). Unfortunately, if you want to do something more complex than simple store/retrieve with memcached, then you have to make lots of expensive remote calls. For example, say I store a tree in the cache, where each node is “key, node key, node key”. To delete the entire tree, you have to recursively get the root node, delete it, then get both branches, delete them, etc. This is expensive.

A possible solution is to send Forth scripts to memcached to execute locally. By adding Factor, a fast implementation of Forth, as the front-end to memcached, clients can send more complex operations to execute locally. Forth is a good choice because the programs tend to be quite terse and easy to interpret quickly. For example, “key get [ “NOT_STORED” ] [ key value set ] if” implements the “add” command in memcached. Since that might be tedious, the user can load a script in Factor that implements that line as a new command: “: add ( key value — result ) “ + the previous code. Now clients can just send “key value add” as their command.

A better example is to delete a tree as described above. The Factor code would look something like this (I’m still learning the syntax). All the work would be done locally on the memcached node.

: delete-tree ( key -- )
     dup get each [ dup 0 = [ . ] [ delete-tree ] if ] delete

By adding an efficient processing language in front of memcached, people can write more powerful programs to utilize distributed caches. What would it look like to delete a tree that is scattered across 100 different nodes? As a matter of fact, LinkedIn (and Digg and Facebook) stores their entire social graph in-memory for very fast traversal. So people are building ad-hoc solutions for problems involving large distributed data structures stored in a cache. They put the logic in a separate program, but I think we can integrate the processing directly into the cache with Forth & memcached.

Concurrent Skip List issues

August 26, 2009

A commenter on my code sample for a concurrent skiplist correctly pointed out two errors in the program. I attempted to fix both, but further testing revealed another problem in Release mode (optimized, no debug). One of the thread would suddenly stop. After much screwing around, I discovered that the program was stuck in a spin wait: “while (!goAhead) ;”. I added a volatile declaration to that variable and then the program worked. It turns out the loop keeps reusing the value in the register, rather than reloading the variable when another thread says it’s OK to go.

I’ve put the code in Launchpad and removed it from the other post. You can download it like this using the Bazaar VC tool:

bzr branch lp:~suranap/+junk/concur.net

There’s still a problem with the program. When I run 2 threads, it is really quick. When I run 4 threads, there is so much contention that the “add” method has to retry a zillion times. Of course, my test case is probably extreme. I’ll compare the performance with Java’s implementation to see if it’s in the same league.

Scientific Evaluation of Programming Languages

August 13, 2009

A new workshop at OOPSLA is aimed at finding ways to evaluate the effectiveness of programming languages and tools. Unfortunately, the entire conference rests on a single problem: how to measure programmer productivity. Except for pointy-haired bosses, most software professionals believe productivity can not be accurately measured. This workshop will likely be yet another outlet for HCI research with awful methodologies.

Fluent Interfaces with ‘with’

August 12, 2009

An ad-hoc embedded domain specific language, aka Fluent Interface, is an API that reads like an English sentence. LINQ does an excellent job of implementing this style so you can write long queries in normal dot notation rather than the query syntax. Someone at Google posted a collection library for Java that does something similar to LINQ, but it’s not as readable as LINQ’s extension methods. Someone wrapped the code with a fluent interface (from google cache) to make it more readable. This requires that all the methods return an Iterables interface so you can do proper method chaining. It doesn’t fit the normal OO-style, and can be tedious to wrap existing classes into this style.

A minor bit of syntactic sugar that I once thought would be useful for C# and Java is the ‘with’ expression, which allows you to write code like this:

with person {
    .FirstName = "Bob";
    .LastName = "Smith";
    .MoveToDepartment("HR");
};

Within the body of the ‘with’, any illegal use of the dot notation (“.Member”) is assumed to use the object “person”. The syntax is just “with Expr Body”, and it returns the Expr when it’s done. With this syntactic sugar, you can now write normal OO code (stateful changes on the same object) that reads like a Fluent Interface. It turns out that this syntax also subsumes the object initializer syntax in C#. Compare these two lines:

Person value1 = new Person { Name = "Chris Smith", Age = 31, CanCode = false};
Person value2 = with new Person() { .Name = "Chris Smith"; .Age = 31; .CanCode = false; };

Line #2 is only a few characters longer, but the syntax is more general than the object initializer in line #1. I can insert arbitrary code in the body, not just initialization code. And I can use the ‘with’ expression to write fluent code on existing OO APIs as shown above. Basically, if C#’s object initializer is a good idea, then my ‘with’ expression is a better idea, IMHO.