Tagged: google

Architecture lessons from Google Talk

An engineer working on Google Talk gave a talk describing the lessons learned from building a very large, scalable system. (incidentally, implemented in Java)

  • Measure the right thing: the difficult part for IM is presence (who’s online now), not messaging.
  • Real life load tests: when they added GTalk to Gmail and Orkut, they didn’t reveal it to the user for a few weeks. Instead, they simulated IM connections to test against huge loads.
  • Dynamic resharding: Prepare to add/subtract machines from your data center and rebalance data across those machines.
  • Add abstractions to hide system complexity: make GTalk a “service”; hide all complexity from other systems like Gmail.
  • Understand semantics of lower level library: Choose the right low level library to match the characteristics of your application.
  • Protect against operational problems: Everything breaks, so prepare and recover for inevitable failures.
  • Any scalable system is a distributed system: must have fault tolerance; collect metrics; trace transactions; etc.
  • Software development strategies: binaries are backward compatible; features can be rolled out incrementally for experimentation; engineers work on production machines.