The wall was quickly approaching. After only a few short years, several of our database tables had over a million rows, a handful had over 10 million, and a few had over 30 million. Our queries were taking longer and longer to execute, and our migrations were taking longer and longer to run. We even had to disable a few customer facing features because the database queries required to support them were too expensive to run, and were causing other issues in the application.
The nature of our business requires us to keep most if not all of this data around and easily accessible in order to provide the level of customer support that we strive for. But, it was becoming very clear that a single database to hold all of this information was not going to scale. Besides, it is common practice to have a separate, reporting database that frees the application database from having to handle these expensive data queries, so we knew that we’d have to segregate the data at some point.
Being a young company with limited resources, scaling up to some super-powered server, or running the leading commercial relational database was not an option. So, we started to look into other solutions. We tried offloading certain expensive queries onto the backup database. That helped a little, but the server hosting the backup database simply didn’t have enough juice to keep up with the load. We also considered rolling up key statistics into summary tables to save us from calculating those stats over and over. However, we realized that this was only solving part of the problem. The tables would still be huge, and summary tables would only replace some of the expensive queries.
In our case, we are planning to use CouchDB as an archive database that we can move old data to once a night. Once the data is moved to the CouchDB database, it would no longer be updated, and would only be used for calculating statistics in the application. Since we would only be moving data into the database once a day, we only need to rebuild the views once a day. Therefore, all queries could simply ask for (and get) stale data, even when the views were in the process of being rebuilt. Also, moving all of the old data out of the relational database would dramatically reduce the size of the specific tables, improving the performance of the queries that hit those tables.
I’m really looking forward to this partial migration to CouchDB. The ability to add new views to the database without affecting existing views gives us the flexibility we need to grow the TextMe application to provide better, more specific, and more relevant statistics. In marketing, statistics are king. Since TextMe is a mobile marketing tool, we want it to be able to provide all of the data that our customers are looking for, and more. I feel that by moving to CouchDB, we will not only be able to re-activate those features that we had to disable due to database performance, but also add more features and gather more statistics that would have otherwise been impossible with our previous infrastructure.
The migration to CouchDB was not always straight forward. We faced several challenges, and learned many lessons over the past month. All of those challenges will be addressed here.
In the coming posts, I plan to talk about:
- Structuring your CouchDB databases, and the documents within them.
- More details about CouchDB views.
- The application code necessary to talk to CouchDB.
- Migrating parts of an existing application from a relational database backed by ActiveRecord to CouchDB.
- How the CouchDB security model differs from a traditional relational database.