CouchDB: A Case Study

This is part 1 in a series of posts that describe our investigation into CouchDB as a solution to several database related performance issues facing the TextMe application.

Part 2: Databases and Documents >>

The wall was quickly approaching. After only a few short years, several of our database tables had over a million rows, a handful had over 10 million, and a few had over 30 million. Our queries were taking longer and longer to execute, and our migrations were taking longer and longer to run. We even had to disable a few customer facing features because the database queries required to support them were too expensive to run, and were causing other issues in the application.

The nature of our business requires us to keep most if not all of this data around and easily accessible in order to provide the level of customer support that we strive for. But, it was becoming very clear that a single database to hold all of this information was not going to scale. Besides, it is common practice to have a separate, reporting database that frees the application database from having to handle these expensive data queries, so we knew that we’d have to segregate the data at some point.

Being a young company with limited resources, scaling up to some super-powered server, or running the leading commercial relational database was not an option. So, we started to look into other solutions. We tried offloading certain expensive queries onto the backup database. That helped a little, but the server hosting the backup database simply didn’t have enough juice to keep up with the load. We also considered rolling up key statistics into summary tables to save us from calculating those stats over and over. However, we realized that this was only solving part of the problem. The tables would still be huge, and summary tables would only replace some of the expensive queries.

It was about this time that my colleague Dave started looking into CouchDB as a possible solution to our issues. Up until this point, I had never heard of CouchDB. CouchDB is document oriented, schema-free database similar to Amazon’s SimpleDB and Google’s BigTable. It stores data as JSON documents and provides a powerful view engine that lets you write Javascript code to select documents from the database, and perform calculations. A RESTful HTTP/JSON API is used to access the database. The database boasts other features as well, such as robust replication, and bi-directional conflict detection and resolution.

The view engine is what peeked our interest. Views can be rebuilt whenever we determine it is necessary, and can be configured to return stale data. Stale data? Why would I want stale data?, you may be asking yourself. Well, one big reason comes to mind. Returning stale data is fast. When configured to return stale data, the database doesn’t have to calculate anything on the fly. It simply returns what it calculated the last time the view was built, making the query as fast as the HTTP request required to get the data. The CouchDB view engine is also very powerful. CouchDB views use a map/reduce approach to selecting documents from the database (map), and performing aggregate calculations on that data (reduce). The reduce function is optional. CouchDB supports Javascript as the default language for the map and reduce functions. However, this is extensible, and there is support out there for writing views in several other languages.

In our case, we are planning to use CouchDB as an archive database that we can move old data to once a night. Once the data is moved to the CouchDB database, it would no longer be updated, and would only be used for calculating statistics in the application. Since we would only be moving data into the database once a day, we only need to rebuild the views once a day. Therefore, all queries could simply ask for (and get) stale data, even when the views were in the process of being rebuilt. Also, moving all of the old data out of the relational database would dramatically reduce the size of the specific tables, improving the performance of the queries that hit those tables.

I’m really looking forward to this partial migration to CouchDB. The ability to add new views to the database without affecting existing views gives us the flexibility we need to grow the TextMe application to provide better, more specific, and more relevant statistics. In marketing, statistics are king. Since TextMe is a mobile marketing tool, we want it to be able to provide all of the data that our customers are looking for, and more. I feel that by moving to CouchDB, we will not only be able to re-activate those features that we had to disable due to database performance, but also add more features and gather more statistics that would have otherwise been impossible with our previous infrastructure.

The migration to CouchDB was not always straight forward. We faced several challenges, and learned many lessons over the past month. All of those challenges will be addressed here.

In the coming posts, I plan to talk about:

  • Structuring your CouchDB databases, and the documents within them.
  • More details about CouchDB views.
  • The application code necessary to talk to CouchDB.
  • Migrating parts of an existing application from a relational database backed by ActiveRecord to CouchDB.
  • How the CouchDB security model differs from a traditional relational database.

Stay tuned!

Be Sociable, Share!

    14 thoughts on “CouchDB: A Case Study

    1. Thanks for writing up your experiences with CouchDB in production. As always if you have any questions or run up against something that doesn’t seem quite right, we’d love to help. The mailing lists and IRC channel are a great vehicle or contact me or another committer directly.

      Cheers,
      Chris

    2. @J Chris Anderson
      Thanks Chris. Thanks to you and the other CouchDB committers for giving the community a wonderful free and open source alternative to the traditional relational database.

    3. Pingback: magnificEnterprise » Blog Archive » JavaScript everywhere: AStar is reborn

    4. Pingback: Linktipps #5 :: Blackflash

    5. Pingback: Ray Daly: Learning CouchDB by Ray Daly

    6. Pingback: links for 2009-10-22 « Xume linklog

    7. Pingback: Massive CouchDB Brain Dump – Matt Woodward’s posterous « mnml

    8. Pingback: NoSQL – Non-relational databases « Lars Barkman

    9. Pingback: NoSQL Daily – Tue Nov 9 › PHP App Engine

    10. Pingback: Getting off the Couch(DB) | Signal blog | Programmer Solution

    11. Pingback: Getting off the Couch(DB) | Signal blog | Programmer Solution | Programmer Solution

    12. Pingback: Scalable CouchDB setup - Just just easy answers

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>