CouchDB Plugins for Scout

Back in December I whipped up a series of CouchDB plugins for the Scout monitoring service. The plugins allow you to track all sorts of metrics for CouchDB, including (but not limited to):

  • Mean reads / second
  • Mean writes / second
  • Mean requests / second for DELETE, GET, HEAD, POST, and PUT requests
  • Mean view requests / second
  • Mean bulk HTTP requests / second
  • Counts for various HTTP response codes

In addition, there is a plugin for individual CouchDB databases and individual couchdb-lucene indexes. The database plugin will report:

  • Database size
  • Number of documents
  • Number of deleted documents
  • Number of update operations

The couchdb-lucene plugin will report:

  • Size of the index
  • Number of documents indexed
  • Number of deleted documents

The kind folks over at Scout have just released two new, official plugins based on the ones I created. The CouchDB Overall plugin combines some of the more important CouchDB metrics into a single plugin, and the CouchDB Database plugin reports the same set of the stats as the database plugin listed above.

The original plugins can be found at More information can be found here. I hope you find them useful.

Thanks to Doug Barth for some help on the plugins, and Derek over at Scout for putting together the official plugins.

Slides From My Intro To CouchDB Talk

Thanks to everybody who showed up at Monday’s ChicagoDB meeting for the great discussion on MapReduce and my talk on CouchDB. Sides from my talk can be found on Slideshare, and the files/commands that were used for the demo can be found on github. As usual, please don’t hesitate to email me with any questions or comments.

See everybody next month!

Speaking About CouchDB at Upcoming ChicagoDB Meeting

I’m going to be speaking about CouchDB at the next ChicagoDB meeting, which will be held on August 16th, 2010. I’m currently putting together some slides that will (I hope) provide a good introduction to CouchDB and its features. I also plan on doing a live demo at the end, so everybody can see CouchDB in action.

Information about the meeting can be found here. I hope to see you there!

“CouchDB: A Case Study” Posts Now Availalbe As A Whitepaper From Couchio

In the spring of 2009, we were starting to run into some performance issues with the Interactive Mediums application (formerly known as TextMe). At the advice of a contractor and friend, we began looking into CouchDB as a potential solution to these problems. As with most young projects, documentation was a bit scarce. The official CouchDB website and the CouchDB wiki had some good information, but after reading what was available we still had many questions. Should I create a new database for each type of document I have? How many views should I store in a design document? What are the advantages and disadvantages of views sharing a design document? How do I even begin migrating my relational database backed application to CouchDB?

So I started taking notes, documenting everything I could regarding what we learned about CouchDB, the design decisions we made for our application (and their respective trade-offs), and the migration of our application code to use CouchDB. I organized those notes, and posted them on this blog as a case study, hoping it would help others looking into CouchDB.

I received a lot of positive feedback from the posts, making me feel like I had in fact filled that need, at least to some degree. Even better, earlier this year I was contacted by Couchio about combining the series of posts into a white paper that would be posted on their site. This would put the case study in front of a larger audience, potentially helping even more people. I was thrilled.

Today, that white paper was released as the “Epic Interactive Mediums Whitepaper” (love the Epic :)). You can get it here. Many thanks to the kind folks over at Couchio for putting this together. I hope people will find it helpful.

CouchDB: The Last Mile

This is the 6th and final post in a series that describes our investigation into CouchDB as a solution to several database related performance issues facing the TextMe application.

<< Part 5: Application Changes

Addressing the remaining issues

We were almost there. After modifying the code to talk to CouchDB, TextMe was successfully pulling data from CouchDB in our development environments. There were just a few remaining issues that needed to be addressed before we could deploy CouchDB to production.

Reducing the view sizes on disk

As I mentioned in a previous post, the amount of disk space consumed by the views was a big problem. If we didn’t do something, we were sure to run out of disk space when migrating our 30 million row messages table to CouchDB.

We determined that it was not what we were emitting from our map functions that was killing us, but how many times we were emitting it. Each of the views emitted a key/value pair for every document in the database. At 30 million documents and 8 views, that ends up being a crap load of key/value pairs.

My colleagues Dave and Jerry took a detailed look at the problem, and came up with a solution. They determined that there was simply no need to be emitting data for each document in the database. While this would give us views that could report statistics by the second, our application only supported presenting statistics by the minute. Even if we were able to support statistics at this level of detail, we doubted our customers would even need it. It was simply not worth the disk space.

So, Dave and Jerry modified the import job described in the previous post to roll up several key statistics by the minute as it was building the documents. When the job finishes processing all of the documents for that minute, it creates a summary document containing all of the rolled up statistics, and adds it to the database. Then, they changed the map functions to only consider these summary documents.

This solution was able to dramatically reduce the sizes of the views on disk, while still supporting the current application functionality. Since we are still persisting all of the original documents to CouchDB, it is possible to add a new statistic to the summary documents should we ever need to.

Oh, and we also picked up two new terabyte database servers, just in case :)

Paginating records in CouchDB

Like many Rails applications, we were using the popular will_paginate gem to paginate results from the database. Given the size of our data sets, pagination was an absolute necessity to keep from using up every last bit of memory.

CouchRest has a Pager class that paginates over view results, but it is in the CouchRest Core part of the library and doesn’t integrate too well with the object model part of the library. It simply returns the view results as an array of hashes. We were hoping to see a solution that would give us back an array of the corresponding ExtendedDocument objects. We were also trying to keep our application from having to know about CouchDB outside of the classes described in the previous post. Having completely different pagination strategies for the two databases would make that more difficult.

So, I decided to write some new pagination code that supported the will_paginate interface and integrated a little better with the object model part of CouchRest. I had a quick solution that same day which fetched view results and handed back an array of the corresponding ExtendedDocument objects. I then spent some time over the next two weeks modifying the code to integrate a little better with CouchRest and add support for CouchRest views, which we weren’t using.

With the new code in place, we can now paginate over a set of contest entries without having to know what database they are coming from.

  :page => 1, :per_page => 50)

This pagination code eventually made it into CouchRest.

Going live

With the remaining issues addressed, it was time to start the production migration. One at a time, we manually started the jobs to move the data from MySQL to CouchDB. When one job completed, we would start the next. As I mentioned before, building the views is very resource intensive. We didn’t want to completely bog down the production machine we were using to do the migration by running multiple jobs at once.

Moving the archived data from MySQL to CouchDB and building all of the views took about a week (a day for this table, a couple of days for that table, etc). Overall, it was a fairly smooth process.

For the initial import, we did not purge any of the data from MySQL. Since we needed to wait until our CouchDB databases were fully populated with all views built before we could start using them, the application needed to continue working with the data in MySQL while the migration was in progress. In anticipation of the eventual switch from MySQL to CouchDB, I added a flag in the application configuration that told the application if it should pull archived data from CouchDB. Once all of the data had been imported and all of the views had been built, we flipped the switch.

With the pouring of a celebratory beer, we watched as our application began pulling data from CouchDB in production. It was time to relax :)

The results

I really wish we had taken the time to record how long our troublesome pages were taking to load before the move to CouchDB. Sadly, we did not. All I can say is that pages that used to occasionally time out were now loading in a few seconds. Since the migration, we have also implemented a few new features that would simply not have been possible without CouchDB due to database performance issues.

The database performance issues we set out to address seem to be a thing of the past. If new ones pop up, I’m confident that we could once again utilize CouchDB to address them.

What’s next

This project was focused on addressing database related performance issues that we were facing in production. With these issues out of the way, and our CouchDB infrastructure built-out and proven, we will soon be building even more reporting capabilities that would have simply killed our old database. TextMe customers will soon be able to view their data in more ways than they could have imagined.

I am also working on a project that takes advantage of CouchDB’s schema-less nature to let our customers store and utilize data they collect from their customers. Such a feature, which essentially lets customers define their own schema, would have been a challenge to implement in a relational database. With CouchDB, its just a document.

Thoughts about this project, and CouchDB

I learned a ton while working on this project. While vaguely familiar with NoSQL databases before this project, I have just recently become aware of all of the alternatives available. With the enormous amount of data companies are beginning to collect and process, I’m sure that CouchDB and its NoSQL friends will soon become a common component in the operational environments of most companies.

The CouchDB community has been great. The CouchDB and CouchRest mailing lists are extremely active, and have been very helpful. The committers on both of these projects are active, and always eager to help. I’d specifically like to call out Jan Lehnardt and Chris Anderson from the CouchDB project. Jan has commented on a few of these posts, encouraging me to keep writing. He also suggested a more efficient implementation of the CouchRest pagination code I wrote, which I quickly implemented. Chris left a comment on the first post in this series thanking me for writing about CouchDB, and offering his assistance if I needed it. I actually took Chris up on that offer when we were running into issues regarding the sizes of the views on disk. He was quick to reply, offering several suggestions. I’d like to thank Jan and Chris for their support and encouragement.

NoSQL databases are here to stay, and CouchDB is truly unique in this area. The way it handles views, and its support for replication/synchronization set it apart from the others. There are already several large projects, like Ubuntu One, that are relying on CouchDB to deliver what nobody else can. Because of this, I’m sure CouchDB has a very bright future ahead of it.