Lessons from Node.js

January 29, 2010 by Pinku

Node.js is a server-side Javascript library for writing scalable network servers. It’s primary purpose is to wrap an asynchronous library around synchronous low-level calls. Code written to block on IO wastes time waiting for a response.

var response = synchCall(parameters) ;
useResponse(response);

The only way to get work done with this code is to have another thread ready to run. Apparently, Apache handles this by creating a new thread per connection, which can be very expensive. Each thread takes 2MB (though this can be reduced, right?) and context switching between threads is relatively expensive. So it doesn’t appear to scale very well, although, this is the worst possible implementation and can be improved with threadpools and/or green threads. An alternative it to switch to asynchronous calls.

asynchCall(parameters,
           function(response) { useResponse(response); }
          );

The callback is stored somewhere, ready to execute whenever the asynchCall returns. This requires that every blocking call be rewritten in this asynch style. To me, this looks like continuation-passing style (CPS), which is a transformation performed automatically by many Scheme compilers. The callback technique here has to be used carefully, i.e. not recursively, to avoid blowing the stack. With continuations you could use this style and not use a single excess stack frame. It would just quickly jump directly into each callback.

It would be interesting to see if a Scheme runtime can be written such that all IO continuations are implicitly asynchronous. I think you’ll need a parallel let to start multiple events; otherwise, it’s still just a single-threaded program. Maybe the signature of these IO functions would something like “read (kontinuation, koncurrent)”. The first continuation is registered to execute when the read finally gets a character from IO. The second continuation is executed immediately as a separate “green thread” of execution. Maybe I can integrate the C wrapper library written for Node.js into a Scheme interpreter like Scheme48. Then I can use scsh, the Scheme shell, to write scripts for scalable servers. I’ll have to think about the semantics of that parallel let thingie though.

Apple iPad almost perfect

January 27, 2010 by Pinku

My instant reaction to the iPad was the same as most nerds: lame. It’s just a giant iTouch with optional 3G. But what really sold me, oddly, is the iPad Case. Everyone has to buy a case to protect this very expensive toy. The case transforms the iPad into a movie player (pic 1) and quasi-laptop (pic 2).

case_movie

 case_laptop

This laptop mode sucks, but it can be improved by adding a thin bluetooth keyboard to the case. See that flap that folds back to support the iPad in pic 1? Instead, let that flap come forward and embed a small fabric keyboard into it. Embed a pop-out kickstand behind the iPad so it stands up at a usable angle. Now you can type on an adequate keyboard and use the entire iPad screen. Even if the fabric keyboard isn’t great, it has to be better than typing directly on the screen. With a decent keyboard, the iPad provides most of the functionality of a laptop for most non-nerd users. With the very cheap pay-as-you-go 3G, I’d definitely buy it (if I weren’t pathologically frugal!).

Second Mover Advantage

January 26, 2010 by Pinku

This article (and this one) pokes a hole in the idea that there’s a sustainable first-mover advantage for businesses. That is, being the first search engine or social networking service did not give those innovators any advantage in the market; in fact, most of those first generation companies are gone (Excite & SixDegrees). This strikes me as obvious, yet VCs insist that being first to market is critically important. Has Microsoft ever built the first of anything? Is the iPhone the first ever smartphone? Even Amazon came several years after Book Stacks Unlimited. There are few successful tech companies that were the original pioneer in that market. Most were second (or third or fourth) movers. How many search engines came before Google? They couldn’t get much funding because that market was seen as a dead end.

The problem with being a first mover is (1) you have to create everything yourself and (2) you have to convince customers to take a chance on your crazy new product. These are especially difficult if you are a startup. The problem with (1) is it takes a lot of trial-and-error to come up with something. You may have to invent some tech to make it work. You may rely on tech or standards that ultimately limit what you can do. The issue with (2) is that you spend all your time and money convincing Innovators and Early Adopters, and then someone else jumps in right when the Early Majority is ready to buy. The second mover will save time and money on (1) because you’ve done all the work. And they free ride on your evangelism and enter the market after you’ve generated demand.

So the right tactic appears to be to jump into a market that the early adopters are excited about, but that isn’t implemented all that well and hasn’t hit the mainstream. I think Twitter is ripe for a second-mover takeover. They’ve validated the idea of micro-blogging, but there is still tons of innovations that can be layered on top. Twitter won’t be able to shift direction too quickly without upsetting their established user base. And their fragile infrastructure also makes it difficult to change. Now if only I had an idea…

NYTimes to erect paywall

January 25, 2010 by Pinku

The NYTimes has finally decided to move to a subscription model for their website, modeled on the Financial Times. This article does not mention an interesting nugget that was in the memo to the NYTimes staff: “There has also been much speculation in the media and elsewhere about whether The Times will join a consortium as part of the metered model implementation plan. At this stage, our plan is to introduce the metered model as a stand-alone product. At the same time, we continue to discuss alternatives with a broad range of prospective collaborators with regard to bundled offers and other aggregation opportunities.” As I suggested earlier, more sites should join a potential NYTimes consortium. For example, I can see The New Yorker, The Atlantic Monthly, and other magazines that appeal to that customer segment easily joining a consortium. That makes a subscription more appealing and kicks a few extra bucks to those magazines, too. Now all they need to do is implement my suggestion for a slimmer dead tree version, preferably a competitor to USA Today.

Bring on the Tobin Tax

January 14, 2010 by Pinku

Yesterday, a few Wall St. CEOs faced a wave of stupid questions from the Financial Crisis Inquiry Commission. Jamie Dimon, head of JPMorganChase, made a flippant but important comment: financial crises happen “every five to seven years.” There’s even a book out called This Time is Different: Eight Centuries of Financial Folly detailing the long history of booms and busts. Warren Buffett has said the boom/bust cycle will continue because fear and greed are hard-wired into human beings. Whatever rules the gov’t throws together today will fix the last problem (housing) but do nothing for the next problem. Also, these rules will be watered down by lobbyists, and further eroded over time as the public forgets and Congress is bribed by their Wall St. masters.

Booms & busts are inevitable. When a bubble pops, the gov’t will need to spend lots of money to prevent the economy from falling into a depression. The solution is to pass a Tobin Tax, a tiny tax on financial transactions. This tax can be designed to be progressive, ranging from 0.1% to 0.5%. If you buy $1M worth of stock, you pay $5K in taxes. At the lower tax rate, $10K in stock would cost $10 in taxes. It can be tweaked to catch most cheaters. Furthermore, the SEC should push most OTC trading to exchanges, which further increases the funds generated.

The money raised by this Tobin Tax should be reserved for the inevitable busts that require gov’t action. Just like a national disaster, the Fed would declare a financial disaster and use the funds to prevent a depression. During good years, the money will be used to pay off the last bubble and save money for the next. Along with authority to take over failing banks and smoothly enter bankruptcy, a financial disaster fund will cure the hangover after a bubble.

[Obama is on TV right now proposing an excise tax on banks to recoup $100B probably lost from TARP. It's fine, but the funny business will just move to hedge funds. Bankers are like cockroaches, you need to gas the whole place or they'll keep coming back.]

This Time is Different: Eight Centuries of Financial Folly

Product ratings should be high

January 11, 2010 by Pinku

I recently signed up for a Netflix trial account and rated around 300 movies. I noticed that I was giving out lots of 3 and 4 stars, a fair number of 5s, but very few 1s and 2s. It reminded me of this article, which says the average rating for products and movies, at lots of different sites, is around 4.3 out of 5. Here’s a post from Youtube showing that most ratings are 5s. Another from Yelp. The article asserts that people are too nice or, for some inexplicable reason, currying favor (from whom? why?). The implications of this are interesting for any collaborative filtering, or “crowdsourcing”, algorithms because if ratings are skewed high than you need to factor that into your recommendation system.

Since I am not a nice person nor am I sucking up to faceless corporate entities, why are my ratings generally favorable? The reason is that I’ve already filtered out the movies I know I’m going to hate by simply not ever watching them. I hate chick flicks, most goofball comedies, most horror movies, all musicals, many action movies and most epic films. Since I won’t watch any of these movies in the first place, I can’t give them the 1 star rating they deserve. The few times I really hate a movie is when I go simply because the movie is popular. I hate all the Harry Potter movies, but I watch them because everyone else seems to like them. (You are all idiots!) Same goes for books and products and restaurants. I don’t go to random restaurants hoping for a good meal. I read the reviews on Yelp and Chowhound and check their menus online before walking in the door. I buy electronic products after reading the reviews on Newegg and studying the specs. I read reviews and watch interviews with the authors before picking up a book. Most people use a similar suite of filters to pick a movie, book, or product; therefore, they will generally be pleased with what they get. A user’s average rating should be much higher than 3 stars (average). In fact, a user’s average rating is really a measure of his filtering scheme: the better his filters the more pleased he will be with his final choices.

Since the average ratings will be skewed too high, how can you get useful information from collaborative rating systems? For Netflix, they should more aggressively ask if you are even willing to see a particular movie (yes/no) and feed that info back into the recommendation system. The Netflix dataset does not contain this information.  Yelp could do the same: does this restaurant sound interesting (yes/no)? For Amazon, it might help to allow customers to save a few choices so they can do a side-by-side comparison. For example, I could select 5 40” LCD TVs that meet my requirements. Amazon could have a slick interface that helps me compare and eliminate them one-by-one, leaving me with my final choice which implicitly gets a high rating because I chose to buy it. Apparently, eBay has a good solution of primarily looking at the percentage of negative sellers’ ratings because those are angry customers. Too many angry customers and you are booted from the site. All these systems can be improved by determining which choices a customer discarded on the way to making a final, usually satisfying, decision.

SEDA on .NET

January 7, 2010 by Pinku

I ran across an architectural model called staged event-driven architecture (SEDA), which purports to make it easier to build dynamically scalable, high-performance systems. The idea is similar to message-passing systems: each stage receives a message on it’s queue, works on it and sends messages to other stages. The difference is that each stage is actually a thread pool rather than a single thread. The extra trick with SEDA, which I haven’t implemented yet, is that each stage monitors its own performance and tunes its execution behavior. If a stage gets backed up, it will reject attempts to put new work on its queue. The senders must be written to intelligently back off by waiting, rejecting new clients or sending easier work items.

This is fairly easy to write in .NET. The problem, however, is I want to use the Parallel library in .NET 4, including the new thread pool. There is only one thread pool in .NET, so I have to simulate the idea of multiple thread pools. I do this by manually restricting each stage from sending too much work into the thread pool. I don’t want a single stage to flood the thread pool with tons of work, thus starving the other stages of threads to make progress on their work. So I set a limit on how much work a single stage can submit concurrently. This limit can be adjusted by the stage as it monitors performance. I’ve attached a prototype of a Stage in C#.

Read the rest of this entry »

A few problems installing Windows 7

January 7, 2010 by Pinku

I installed Windows 7 64-bit on my desktop and was forced to overcome a few problems. It took a great deal of searching to find the solutions, so I’m detailing it here to help others. My hardware is an ASUS P5B motherboard with an Intel Core2 Duo 6420. The first lesson is never use the Windows tools to upgrade the BIOS. I did and it screwed up my BIOS. I had to send for a replacement motherboard from ASUS (they responded quickly and without any hassles).

The first problem was that I added 4GB RAM for a total of 6GB, but only 5GB showed up in Windows. The solution here is to turn on the “Memory Remap Feature” under Advanced->North Bridge Configuration in the BIOS. See here for an excellent walk through.

The second problem was my new 1TB external eSATA drive wouldn’t show up in Windows. It turns out you need to turn on AHCI in the BIOS. However, if you’ve already installed Windows 7 then you need to reinstall the OS with the feature turned on. Thankfully, this post explains how to do it without reinstalling the OS. In the BIOS, you need to turn on SATA in the BIOS under Main->IDE Configuration->Configure SATA As… and choose AHCI. Also, under Power->ACPI 2.0 Support I switched that to Enabled. Don’t know if that last bit helped.

The final problem was that the eSATA drive showed up perfectly when I booted; however, it would disappear when I resume from sleep mode. I fixed this by installing the latest JMicron RAID drivers from ASUS. Others suggested installing Intel’s Matrix Storage drivers, but the installer complained that I didn’t meet the software requirements, whatever that means.

Also, on my machine the eSATA interface is almost 2-3X faster than USB. Using HDTune to measure the transfer rate, eSATA was minimum of 40MB/sec, maximum of 90MB/sec and average of 72MB/sec. With USB it was 33MB/sec min, max and average. Therefore, if you care about speed then eSATA is more important than the quality of the disk itself.

Parallel Programming Patterns

December 18, 2009 by Pinku

Here’s a very good list of patterns for parallel programming and algorithms.

Spelling Corrector in Factor

December 15, 2009 by Pinku

A while ago, Peter Norvig posted a lucid explanation of how to write a spelling corrector. He did it in an impressive 21 lines of readable Python code. Here is my attempt at the same spelling corrector using the Factor programming language. It’s 19 lines if you don’t count the unusually verbose USING statement at the top.
Read the rest of this entry »