Architecture lessons from Google Talk

An engineer working on Google Talk gave a talk describing the lessons learned from building a very large, scalable system. (incidentally, implemented in Java)

  • Measure the right thing: the difficult part for IM is presence (who’s online now), not messaging.
  • Real life load tests: when they added GTalk to Gmail and Orkut, they didn’t reveal it to the user for a few weeks. Instead, they simulated IM connections to test against huge loads.
  • Dynamic resharding: Prepare to add/subtract machines from your data center and rebalance data across those machines.
  • Add abstractions to hide system complexity: make GTalk a “service”; hide all complexity from other systems like Gmail.
  • Understand semantics of lower level library: Choose the right low level library to match the characteristics of your application.
  • Protect against operational problems: Everything breaks, so prepare and recover for inevitable failures.
  • Any scalable system is a distributed system: must have fault tolerance; collect metrics; trace transactions; etc.
  • Software development strategies: binaries are backward compatible; features can be rolled out incrementally for experimentation; engineers work on production machines.

One comment

  1. theanti9

    ive watched a bunch of their videos about distributed systems, they’re very interesting. in fact im using a lot of what i learned from that to try and do some experimenting/research with my own distributed system. I just have a bunch of spare computers that i set up but im working on software that will be dynamically scalable and hopefully include most of those properties that you listed. There is so much to consider within a distributed system, the programming required for it is crazy, but i love it.

Leave a comment