It been a while since I mentioned this idea, but here’s an implementation of a memcached client in Factor. It’s not completely robust, but it works well enough. Next up is a server that accepts Factor scripts to execute locally.
memcached client in Factor
October 15, 2009 by PinkuFixing Health Care
October 13, 2009 by PinkuThe hysterical screaming over health care has reached outlandish new lows. One thing I’ve learned is that policy debates are exactly like political debates: misinformation, exaggeration, outright lies, and bitter hatred for the opposition. This isn’t a good way to transform a $2 trillion chunk of our economy. For posterity, here’s an outline of how to “fix health care”.
How I quit reading about politics
September 11, 2009 by PinkuI used to read stacks of political magazines in the 90s. In 2004 The New Republic and The National Review formed a joint website called Opinion Duel, in which a writer from each magazine would debate a topic. All the debates quickly degenerated into flame wars, albeit with expensive Ivy League vocabularies. Both sides twisted facts, misrepresented their opponent’s arguments, made snide remarks, and were basically jerks. They made no attempt to actually understand the other side’s point-of-view. I finally understood that public policy debates are no better than political debates. Like trolls on Usenet, both are more interested in winning the argument than in finding the right answer.
SAP for animal shelters
September 2, 2009 by PinkuThis is my dog, Elvis. He’s a 2 year old Siberian Husky from a local animal rescue organization, which really consists of one frazzled woman with a dozen dogs packed in her house. I’ve had him for 5 months now and he’s turned out to be perfect: friendly, calm, gentle, clean, smart and
playful. And when we’re walking, he scares people so they move out of our way.
I am lucky to have gotten this dog. Because I’ve never had a dog and I live in an apartment, the shelters I contacted from PetFinder either turned me down immediately or, in most cases, failed to respond. I’ve since met many people who also ran into this resistance from shelters and opted to get a dog from a breeder instead. The only reason I got Elvis is because the woman who runs this shelter was distracted by personal problems and allowed her much more reasonable and level-headed friends handle the adoption. They said it was common for shelters (including this one) to turn down lots of good homes, hoping to find a perfect home for their dogs. The people who run shelters think with their hearts, not with their brains.
The problem is that the number of surplus dogs & cats vastly exceeds the total capacity of all shelters to house and feed them. Consequently, the local animal control office is forced to euthanize thousands of animals. If a shelter can find a home for a dog, it frees up a spot to save one more dog. When these shelters hang on to their dogs, they are condemning another to its death. Worse, if a person grows frustrated and turns to a breeder, it sends an economic signal to continue breeding more dogs. The goal is to place as many dogs in good homes as possible. Obvious, yet it isn’t happening.
I’d like to build a website to help manage animal shelters. Sort of a “SAP for shelters”. It should track their inventory of pets and streamline communication with customers. It should manage volunteers and foster homes. It should monitor food, medicine, and supplies. It should keep track of finances, donations and lots of other stuff. But first I need to learn more about how shelters operate. If Once I get going, I hope to make this all open-source and encourage other, smarter web developers to fix my mistakes. This won’t solve all the problems faced by animal shelters, but it will help smooth the path to those solutions.
Processing language for memcached
August 30, 2009 by Pinkumemcached is a hashtable with a simple API that allows for remote access. It is a major component of many large-scale web sites because it caches results from more expensive remote resources, i.e. DB access. The API consists of a several different storage methods (set, add, replace, append, prepend, cas), retrieval methods (get, gets), and a delete command (delete). Unfortunately, if you want to do something more complex than simple store/retrieve with memcached, then you have to make lots of expensive remote calls. For example, say I store a tree in the cache, where each node is “key, node key, node key”. To delete the entire tree, you have to recursively get the root node, delete it, then get both branches, delete them, etc. This is expensive.
A possible solution is to send Forth scripts to memcached to execute locally. By adding Factor, a fast implementation of Forth, as the front-end to memcached, clients can send more complex operations to execute locally. Forth is a good choice because the programs tend to be quite terse and easy to interpret quickly. For example, “key get [ “NOT_STORED” ] [ key value set ] if” implements the “add” command in memcached. Since that might be tedious, the user can load a script in Factor that implements that line as a new command: “: add ( key value — result ) “ + the previous code. Now clients can just send “key value add” as their command.
A better example is to delete a tree as described above. The Factor code would look something like this (I’m still learning the syntax). All the work would be done locally on the memcached node.
: delete-tree ( key -- )
dup get each [ dup 0 = [ . ] [ delete-tree ] if ] delete
By adding an efficient processing language in front of memcached, people can write more powerful programs to utilize distributed caches. What would it look like to delete a tree that is scattered across 100 different nodes? As a matter of fact, LinkedIn (and Digg and Facebook) stores their entire social graph in-memory for very fast traversal. So people are building ad-hoc solutions for problems involving large distributed data structures stored in a cache. They put the logic in a separate program, but I think we can integrate the processing directly into the cache with Forth & memcached.
Concurrent Skip List issues
August 26, 2009 by PinkuA commenter on my code sample for a concurrent skiplist correctly pointed out two errors in the program. I attempted to fix both, but further testing revealed another problem in Release mode (optimized, no debug). One of the thread would suddenly stop. After much screwing around, I discovered that the program was stuck in a spin wait: “while (!goAhead) ;”. I added a volatile declaration to that variable and then the program worked. It turns out the loop keeps reusing the value in the register, rather than reloading the variable when another thread says it’s OK to go.
I’ve put the code in Launchpad and removed it from the other post. You can download it like this using the Bazaar VC tool:
bzr branch lp:~suranap/+junk/concur.net
There’s still a problem with the program. When I run 2 threads, it is really quick. When I run 4 threads, there is so much contention that the “add” method has to retry a zillion times. Of course, my test case is probably extreme. I’ll compare the performance with Java’s implementation to see if it’s in the same league.
Scientific Evaluation of Programming Languages
August 13, 2009 by PinkuA new workshop at OOPSLA is aimed at finding ways to evaluate the effectiveness of programming languages and tools. Unfortunately, the entire conference rests on a single problem: how to measure programmer productivity. Except for pointy-haired bosses, most software professionals believe productivity can not be accurately measured. This workshop will likely be yet another outlet for HCI research with awful methodologies.
Fluent Interfaces with ‘with’
August 12, 2009 by PinkuAn ad-hoc embedded domain specific language, aka Fluent Interface, is an API that reads like an English sentence. LINQ does an excellent job of implementing this style so you can write long queries in normal dot notation rather than the query syntax. Someone at Google posted a collection library for Java that does something similar to LINQ, but it’s not as readable as LINQ’s extension methods. Someone wrapped the code with a fluent interface (from google cache) to make it more readable. This requires that all the methods return an Iterables interface so you can do proper method chaining. It doesn’t fit the normal OO-style, and can be tedious to wrap existing classes into this style.
A minor bit of syntactic sugar that I once thought would be useful for C# and Java is the ‘with’ expression, which allows you to write code like this:
with person {
.FirstName = "Bob";
.LastName = "Smith";
.MoveToDepartment("HR");
};
Within the body of the ‘with’, any illegal use of the dot notation (“.Member”) is assumed to use the object “person”. The syntax is just “with Expr Body”, and it returns the Expr when it’s done. With this syntactic sugar, you can now write normal OO code (stateful changes on the same object) that reads like a Fluent Interface. It turns out that this syntax also subsumes the object initializer syntax in C#. Compare these two lines:
Person value1 = new Person { Name = "Chris Smith", Age = 31, CanCode = false};
Person value2 = with new Person() { .Name = "Chris Smith"; .Age = 31; .CanCode = false; };
Line #2 is only a few characters longer, but the syntax is more general than the object initializer in line #1. I can insert arbitrary code in the body, not just initialization code. And I can use the ‘with’ expression to write fluent code on existing OO APIs as shown above. Basically, if C#’s object initializer is a good idea, then my ‘with’ expression is a better idea, IMHO.
Puzzling Python
July 24, 2009 by PinkuBecause many programmers I respect are fond of Python, I’m forcing myself to learn it (again). On my previous attempts, I ran into some annoyances and dropped it. This time, I’m trying to solve some Facebook puzzles by gluing together existing libraries. I tried the Peak Traffic puzzle, which is to enumerate the maximal cliques in a directed graph. Some searching led me to an excellent library for graph problems called NetworkX. Some tinkering with the library hit the first problem: on very large complete graphs, I kept blowing the stack. That’s because v0.99 uses (perfectly normal) recursion to solve the problem but hits a stack limit in Python. In v1.0, the devs were forced to implement their own stack to get around this annoying limit. The second issue is that this is a branch & bound optimization problem, which can easily make use of many cores. Instead, the program maxes out at 50% cpu. This would have been trivial to parallelize in .NET 4.0 or Fortress with parallel blocks.
So I thought maybe Jython might solve both problems. Unfortunately, Jython (v2.5) is 4X slower than CPython running the exact same script on a very large input file. Next, Jython also sets the recursion limit to 1000 stack frames, so I ran into the same issue. I know I can raise it manually, but that doesn’t help when I write a program and don’t know how far I’m going to recurse. Why set an arbitrary limit? Just throw the same runtime exception when the platform runs out of space like every other language on the planet. Finally, I imported jsr166, an excellent threadpool library for Java. But on a simple fibonacci test, the performance was so much slower than the equivalent Java program that it simply wasn’t worth considering. It did, however, make use of more threads, though it rarely went over 80%. Maybe there’s a lock somewhere that’s slowing things down. In summary, Jython sucks and CPython is acceptable.
Orbitz is screwing their customers
July 22, 2009 by Pinku[ed: An Orbitz rep says they don't have the capability to tweak prices in this way. It's probably a mistake, but others have seen this happen too.]
Someone told me a few months ago that Orbitz will detect and raise prices for some customers. It didn’t occur to me to check this until last night. I found a good deal on Orbitz for a hotel+car package, but looked around the web for better deals. After a few minutes, I reran the search on Orbitz and the same deal came up $200 more expensive. I fired up IE’s InPrivate browser and tried again. Now the original, cheaper price showed up. I reran the search in a previous browser and still got the $200 extra charge. Because I lack an MBA, I don’t understand how screwing your frequent customers is a good business tactic. I’ll be searching Orbitz and other travel sites using InPrivate or InFilter mode from now on because I don’t trust their prices anymore. If more people find out about this, people will lose trust in Orbitz and use it less frequently. I suggest people use a different browser or computer to verify that you aren’t getting screwed on prices, particularly package deals.