SaaS companies need to offer customers at least some flexibility in the type of data they can store in a hosted DB platform. For example, Force.com allows nearly complete customization. In addition, sometimes you want a more flexible schema so you can make quick changes in your program without bothering the DB with every change. This paper describes and compares several techniques:
- Private tables: Every customer gets their own custom tables in the DB. This is clearly not scalable. Consider 1000s of customers constantly tweaking their schema.
- Extension tables: Customers get custom tables that are joined with the base table. Only marginally better than private tables.
- Sparse columns: Allocate a table with lots of empty extra columns (force.com has 500 empty varchar columns). Then map custom fields to columns in your application. Most of these columns will be empty, so MSSQL has some optimizations for how this is stored.
- XML columns: All custom fields are stored as an XML blob in an extra column. The application is responsible for parsing it. FriendFeed used this technique.
- Pivot Tables: I think the paper is referring to column-oriented data stores in general. It’s not too different from 4, except all tables are stored in a single massive table.
If I want to easily support schema changes, I think I’d use a combination of 3 and 4. Basically, you have to write a schema interpreter that maps your types into storage. It’s not too bad… entity frameworks do it, Hibernate does it, Rails does it. AFAIK, none of them do 4, i.e. grab some fields stored within XML in a column. I’ll have to check on that.