Processing language for memcached

memcached is a hashtable with a simple API that allows for remote access. It is a major component of many large-scale web sites because it caches results from more expensive remote resources, i.e. DB access. The API consists of a several different storage methods (set, add, replace, append, prepend, cas), retrieval methods (get, gets), and a delete command (delete). Unfortunately, if you want to do something more complex than simple store/retrieve with memcached, then you have to make lots of expensive remote calls. For example, say I store a tree in the cache, where each node is “key, node key, node key”. To delete the entire tree, you have to recursively get the root node, delete it, then get both branches, delete them, etc. This is expensive.

A possible solution is to send Forth scripts to memcached to execute locally. By adding Factor, a fast implementation of Forth, as the front-end to memcached, clients can send more complex operations to execute locally. Forth is a good choice because the programs tend to be quite terse and easy to interpret quickly. For example, “key get [ “NOT_STORED” ] [ key value set ] if” implements the “add” command in memcached. Since that might be tedious, the user can load a script in Factor that implements that line as a new command: “: add ( key value — result ) “ + the previous code. Now clients can just send “key value add” as their command.

A better example is to delete a tree as described above. The Factor code would look something like this (I’m still learning the syntax). All the work would be done locally on the memcached node.

: delete-tree ( key -- )
     dup get each [ dup 0 = [ . ] [ delete-tree ] if ] delete

By adding an efficient processing language in front of memcached, people can write more powerful programs to utilize distributed caches. What would it look like to delete a tree that is scattered across 100 different nodes? As a matter of fact, LinkedIn (and Digg and Facebook) stores their entire social graph in-memory for very fast traversal. So people are building ad-hoc solutions for problems involving large distributed data structures stored in a cache. They put the logic in a separate program, but I think we can integrate the processing directly into the cache with Forth & memcached.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s