June 19, 2009 by Pinku
Software projects are late and over-budget, yet still fail to meet all requirements and are buggy. What is the problem with software engineering processes and tools that lead to such terrible results? Nothing. Most complex projects are late and over-budget, and still fail to meet all requirements and are buggy. Boeing is paying sizable penalties to customers because their latest plane is late and over-budget. Most defense contracts are late and over-budget. In fact, it’s rare to find a contractor that can fix your house without being late and over-budget. These problems are not unique to software; therefore, software isn’t the problem.
The problem is entirely people. The person managing the task is forced to guess when the task will be complete (when will you cure cancer?). The guess inevitably becomes a hard deadline. The task is ill-defined and vague (make the game more fun). The employees may not have the right skills to do the job effectively. Add to this the general dysfunction of any bureaucracy and there is very little hope of accomplishing anything on time. Shuffling people into different roles and slapping on new processes and tools will do nothing to solve the core problem: most people and organizations are not capable of delivering results efficiently.
Should you throw your hands up in despair? Yes. Most IT organizations are asking too much of their staff. It’s like asking a nurse to perform surgery. If management lowers their expectations, they can get good, consistent results from their IT staff. Otherwise, they are doomed to failure no matter what software engineering bandwagon they jump on. If companies with great talent and tools and experience (MS, Sun, IBM, Google, etc.) still have significant problems building software, how can anyone else do a good job? Stop trying and you’ll be much happier.
Posted in Business, Technology | Leave a Comment »
April 23, 2009 by Pinku
Guido van Rossum, the benevolent dictator for life of the Python language, recently posted that he does not believe Python needs to support tail call optimization (TCO). However, he demonstrates his ignorance of TCO, as do many of these comments on Reddit. What is TCO? Whenever you make a function call, the language saves your state on the stack and creates a new frame for the next function. If you make a huge number of calls, then you could potentially run out of stack space. The obvious answer, therefore, is not to write programs that make lots of recursive calls. This is acceptable for every mainstream language, none of which support TCO.
So why is TCO useful? It allows you to create complex recursive programs that behave like loops. The most common example is the least compelling:
1: int sumList (Node n, int sum) {
2: if (n != null)
3: return sumList (n.next, sum+n.val) ;
4: else
5: return sum ;
6: }
For very large lists, the recursive call to sumList will run out of stack space and raise an exception. TCO will adjust the recursive call to reuse the same stack frame; therefore, this recursion will work on very large lists without using any extra space. The reason this example is lame is because it can easily be rewritten to use a while loop. A better example is to use recursion for stream-based programming (just a sketch in C#-ish code):
1: abstract class Plugin {
2: Plugin nextStep ;
3: public int Process (int input) {
4: return nextStep.Process (DoWork(int)) ;
5: }
6:
7: public abstract int DoWork (int x) ;
8: }
9:
10: stream = new ReadData (new Plugin1 (new Plugin2 (new Plugin3 ()))) ;
11: stream.endPoint = stream ;
12: stream.readFile (file) ;
The idea here is you can glue together any set of Plugins to build a pipeline that processes a stream of data in a big recursive circle of calls. Normally this would blow the stack, but TCO turns this into a loop that doesn’t use any extra stack space. You could rewrite this to avoid recursion, but you end up implementing a weak form of TCO yourself. When you don’t have to worry about it, you can write more complex code and trust the compiler to optimize it for you. As an analogy, you can write programs in languages that don’t have GC, but it’s annoying to manage memory manually.
Guido lists three points against the inclusion of TCO: (1) TCO doesn’t give you nice stack traces when things fail because it drops all the intermediate stack frames; (2) Once you add TCO, every implementation must support it; (3) He doesn’t “believe in recursion as the basis of all programming”. #1 & #2 are true, but #3 is an exaggeration. Recursion is a nice feature that makes programming easier in some respects, but not the basis of everything. Instead, I think #2 is a strong argument against TCO for Python because it would be inefficient in some implementations, and very few Python programmers would understand it anyway. I just hope the next scripting fad will have this feature.
Posted in Technology | Leave a Comment »
April 3, 2009 by Pinku
Paul Buchheit was noting the improved performance of Java relative to C. He tried Javascript and said, “Compiled JS is about 9x slower than C on this test. If CPU speed doubles every 18 months, then JS in 2007 performs like C in 2002.” That’s a very good point. Most of the software written in C/C++ on circa 2000 hardware could now be written with a scripting language. Since performance in 2000 was good enough to power everything around us, a scripting language should be good enough for most systems today.
Posted in Technology | Leave a Comment »
March 27, 2009 by Pinku
There is finally a growing recognition that the industrialized production of meat is bad for people and the planet. When something has a significant negative externality (e.g. pollution) a Pigovian tax is used to add the social costs to that product’s price. That’s why many economists support a tax on gasoline. The negative externalities include production of greenhouse gases and undermining our security interests by sending too much money to unfriendly countries. For meat, there are several negative externalities.
- Grazing animals produce more greenhouse gases (belching and farting) than cars.
- Industrial farms seriously degrade water quality because of large amounts of feces.
- It’s an inefficient use of water and arable land, both of which are subsidized by the gov’t.
- Over-consumption of meat leads to poor health, which adds to our total health care costs.
By adding a tax and/or removing agricultural subsidies for meat production, it raises the price to match the real total cost of a double cheeseburger. Though it is impossible to expect people to suddenly become vegetarians (I’ve struggled for 10 years), it is trivial for people to reduce the amount of meat they eat. Americans are grotesquely obese because they eat too much of everything. Reducing total food consumption means less meat consumption which leads to lower rates of obesity. Raising the cost of meat should hopefully push people to eat more fruits and vegetables. I think the biggest problem is America’s meat & potatoes food culture. People just don’t know what to do with vegetables. Incidentally, I asked Greg Mankiw, Bush’s economics advisor, about this and he agreed with a Pigovian tax on meat. (Isn’t the Internet great?) So you could likely get bourgeoisie support, but the overweight proletariat will riot to get cheap bacon. Mmmmmmm, bacon.
Posted in Economics, Society | 1 Comment »
March 19, 2009 by Pinku
Again from a presentation given by Jeff Dean, a brilliant Google engineer. These are the failure rates of hardware in a typical first year for a new cluster at Google:
- ~0.5 overheating (power down most machines in <5 mins, ~1-2 days to recover)
- ~1 PDU failure (~500-1000 machines suddenly disappear, ~6 hours to come back)
- ~1 rack-move (plenty of warning, ~500-1000 machines powered down, ~6 hours)
- ~1 network rewiring (rolling ~5% of machines down over 2-day span)
- ~20 rack failures (40-80 machines instantly disappear, 1-6 hours to get back)
- ~5 racks go wonky (40-80 machines see 50% packet loss)
- ~8 network maintenances (4 might cause ~30-minute random connectivity losses)
- ~12 router reloads (takes out DNS and external vips for a couple minutes)
- ~3 router failures (have to immediately pull traffic for an hour)
- ~dozens of minor 30 second blips for dns
- ~1000 individual machine failures
- ~thousands of hard drive failures
- slow disks, bad memory, misconfigured machines, flaky machines, etc.
My comment: distributed systems must deal with failure constantly and consistently. My intuition (which is usually wrong) is that Java/.NET exception handling isn’t that good because there’s no convenient way (AFAIK) to fix an exception and retry/resume the block of code like you can with Lisp conditions. Software transactional memory (STM) sort of does this, i.e. it doesn’t fix anything, but it keeps retrying the block until the transaction succeeds (the inputs haven’t changed before you update the result). I’m sure there’s a solution buried in the Hadoop code. I’ll dig it out and post it soon.
Posted in Technology | Leave a Comment »
March 12, 2009 by Pinku
I rarely read the AEI WSJ op-ed page, but today there was an interesting op-ed against computerized medical records by a pair of doctors. The argument is that current implementations of electronic medical records have not improved care or costs; therefore, projections by RAND of large potential savings are dubious. This is probably true, but it is exactly the kind of short-sighted argument one typically faces with any new idea. The problem is that current implementations are poor copies of paper medical records. Without more intelligence in the software and wider infrastructure (i.e. connections with other doctors, hospitals and insurance companies), electronic records can’t improve anything. Read Uwe Reinhardt posts that explain some of the problems with medical care. The two big ones are (1) doctors frequently deliver incorrect or inefficient treatments and (2) different regions and hospitals charge wildly different amounts for the same treatments. A goal of electronic medical records is to reduce these problems by (1) reminding doctors what the typical treatments should be and (2) easily pinpointing hospitals/doctors that are overcharging for procedures. Another potential savings is to reduce some admin overhead, which is one significant reason the US costs so much more than comparable countries. Any guess at potential savings can be easily criticized because it is just a guess. But the history of the computers’ impact on different industries makes me believe any estimate of future savings will fall far short of reality. We can’t get there unless we start today.
Posted in Business | Leave a Comment »
March 3, 2009 by Pinku
Sometimes, when I fail to hit mute on my TV, I am blasted by the brainless rantings of CNBC’s Larry Kudlow, whose zeal for tax cuts rivals his earlier addiction to cocaine and alcohol. He believes that returning high-income tax rates to those during the 90’s boom will lead to the destruction of mankind. If rich people are so motivated by tax rates, then why do so many live in NYC and California? Even Kudlow lived in NYC until he was fired from his Wall St. job and went to a drug treatment center in ‘95. NY has a fairly high state tax that is 6.85% max and NYC adds ~3.5% max. There’s also a fairly high property tax and sales tax. Why haven’t the rich left NYC a long time ago? Financial companies can now move anywhere because the markets are mostly electronic. People could commute from upstate NY, CT or NJ to avoid some NYC tax. Some do, but many remain. More importanly, why did Kudlow live in NYC rather than commute from a lower tax area? The boat outside my house takes me to Wall St. in 10 minutes.
Kudlow’s addictive personality and meager education in economics made him susceptible to Arthur Laffer’s simplistic view of the world. And his pompous and pseudo-intellectual manner probably makes him popular on the right-wing cocktail party circuit. Indeed, his conversion to Catholicism is yet another example of his need to belong to a cult. My only hope is that someday Steve Liesman will strangle Kudlow live on-the-air.
Posted in Technology | 3 Comments »
March 2, 2009 by Pinku
Posted in Business | Leave a Comment »
March 1, 2009 by Pinku
Here’s the total amount of US debt owned by foreign governments. They own about 25% of the total $10 trillion dollar debt. Here’s a breakdown of ownership of all the debt (doc) over the last 10 years. Right now China owns a mere 7% of the total US national debt. In fact, investors are withdrawing their money from markets around the world and stuffing them into US Treasuries, which are perceived to be the safest place for money right now. From June’07 to June’08, foreigners bought ~$450B more and US mutual funds added ~$200B of US debt. Remember this the next time a politician stirs up nationalism and racism by complaining that China owns the US. Here’s a good source for more info on the debt.
Posted in Politics | Leave a Comment »
March 1, 2009 by Pinku
Project Coin on OpenJDK is soliciting ideas for small language changes for Java 7. The ideas so far appear to suggest features from C#, which is a superior language IMHO. In principle, I hate the idea of baking in cosmetic changes to a language. Scheme and Lisp macros are a profoundly better solution to syntactic issues, but mainstream programmers will never grok it. So Java has no choice but to add more stuff. Therefore, I think the switch statement is seriously underutilized in C-like languages. One idea is to add support for a limited degree of object pattern matching. Think of it as a painfully limited version of the case expression in many functional languages.
The switch in Java currently only allows a few primitive types. Change it so now you can have switch(objectType). Allow the case arms to support a boolean expression. To simplify things, allow access to object fields like this: “.field”. So the new switch would look like this:
1: switch (myCustomer) {
2: case (.name == "Bob"): stmts ;
3: case (.age > 21 && .income > 20000): stmts ;
4: default: stmts ;
5: }
A simpler idea is to stick with the regular switch statement, but support multiple switch variables. Right now, you can only do “switch (oneItem)”. Instead, you could allow “switch (oneItem, twoItem)” and more. In the case arms, right now they allow “case constant:”. Instead, this can be extended to allow “case constant1, constant2″ and more. Obviously, each constant is matched with the switch variable in the same position. If there are too many constants, that’s a syntax error. If there are fewer constants, then ignore the remaining variables. Here’s an example:
1: switch (document, output) {
2: case "pdf", "printer": print it out;
3: case "pdf", "screen": use acrobat;
4: case "jpg", "printer": turn on fancy colors and print;
5: default: b/w print;
6: }
Finally, the absolute simplest extension to the existing switch statement is to allow case ranges. This is already implemented in Visual Basic.NET with their Select Case statement. You say “Case 1 to 10″ to match that range. Lots of languages support this. This feature won’t change the world, but it does make switch a bit more useful.
Posted in Technology | 2 Comments »