Browsing articles tagged with " textme"
Jun 19, 2009
John Wood

Paginating Records in CouchDB via CouchRest

Update: This change has been incorporated into CouchRest version 0.30

When I began looking into replacing some of TextMe‘s large MySQL tables with CouchDB databases, one of the things I noticed right away was that pagination support was not quite there in CouchRest. I say “not quite there” because CouchRest does have the ability to fetch data from the database in paginated chunks, but the current support didn’t really fit too well with way the rest of the library interacts with CouchDB views. A helper class had to be used to fetch the data, and the data came back as hash instead of an instance of the appropriate class.

Pagination is a must for us, because these tables in particular are very large. That’s one of the main reasons why we’re moving them to CouchDB in the first place. Loading all of the data into memory at once would be troublesome to say the least.

CouchRest is still a very young library, currently on version 0.29. However, despite its age, it is already fully featured and off to a great start. So, I saw this as an opportunity to contribute to something that we have already greatly benefited from.

With a little inspiration from Rails, I decided to implement a proxy that would be created when a view was called to fetch data. The proxy would defer getting data from the database until that data was actually needed. I then implemented will_paginate style paginate and paginated_each methods on the proxy object. If either of these methods are called, only a chunk of data will be fetched from the database, and that data will be returned as an array of instances of the appropriate class. If any other method is called on the proxy, the proxy will fetch all of the data from the view, and forward the call on to the “real” array.

I decided to go with will_paginate style methods because the will_paginate gem is by far the most popular pagination solution for Rails. We use it extensively in TextMe. So, implementing the same methods would ensure that we could continue to use our existing pagination code, and the code wouldn’t have to know if it was dealing with a collection of ActiveRecord objects or a collection of CouchRest ExtendedDocument objects.

The new code also throws some methods onto the class itself that lets you paginate over instances of the class without having an instance of the proxy, or a view in your CouchRest ExtendedDocument object.

Here are some examples, pulled from the CouchRest tests:

Paginating using instance methods:

articles = Article.by_date :key => Date.today
articles.paginate(:page => 1, :per_page => 3).size.should == 3

articles = Article.by_date :key => Date.today
articles.paginated_each(:per_page => 3) do |a|
  a.should_not be_nil
end

Paginating via class methods:

articles = Article.paginate(:design_doc => 'Article',
  :view_name => 'by_date', :per_page => 3, :descending => true,
  :key => Date.today, :include_docs => true)
articles.size.should == 3

options = { :design_doc => 'Article', :view_name => 'by_date',
  :per_page => 3, :page => 1, :descending => true,
  :key => Date.today, :include_docs => true }
Article.paginated_each(options) do |a|
  a.should_not be_nil
end

Currently, the forked version of CouchRest containing this feature can be found on GitHub, at http://github.com/jwood/couchrest/tree/master. I’ve submitted a request to have this pulled into the main CouchRest repository.

Hopefully this will be helpful to others.

Jun 15, 2009
John Wood

CouchDB: A Case Study

This is part 1 in a series of posts that describe our investigation into CouchDB as a solution to several database related performance issues facing the TextMe application.

Part 2: Databases and Documents >>

The wall was quickly approaching. After only a few short years, several of our database tables had over a million rows, a handful had over 10 million, and a few had over 30 million. Our queries were taking longer and longer to execute, and our migrations were taking longer and longer to run. We even had to disable a few customer facing features because the database queries required to support them were too expensive to run, and were causing other issues in the application.

The nature of our business requires us to keep most if not all of this data around and easily accessible in order to provide the level of customer support that we strive for. But, it was becoming very clear that a single database to hold all of this information was not going to scale. Besides, it is common practice to have a separate, reporting database that frees the application database from having to handle these expensive data queries, so we knew that we’d have to segregate the data at some point.

Being a young company with limited resources, scaling up to some super-powered server, or running the leading commercial relational database was not an option. So, we started to look into other solutions. We tried offloading certain expensive queries onto the backup database. That helped a little, but the server hosting the backup database simply didn’t have enough juice to keep up with the load. We also considered rolling up key statistics into summary tables to save us from calculating those stats over and over. However, we realized that this was only solving part of the problem. The tables would still be huge, and summary tables would only replace some of the expensive queries.

It was about this time that my colleague Dave started looking into CouchDB as a possible solution to our issues. Up until this point, I had never heard of CouchDB. CouchDB is document oriented, schema-free database similar to Amazon’s SimpleDB and Google’s BigTable. It stores data as JSON documents and provides a powerful view engine that lets you write Javascript code to select documents from the database, and perform calculations. A RESTful HTTP/JSON API is used to access the database. The database boasts other features as well, such as robust replication, and bi-directional conflict detection and resolution.

The view engine is what peeked our interest. Views can be rebuilt whenever we determine it is necessary, and can be configured to return stale data. Stale data? Why would I want stale data?, you may be asking yourself. Well, one big reason comes to mind. Returning stale data is fast. When configured to return stale data, the database doesn’t have to calculate anything on the fly. It simply returns what it calculated the last time the view was built, making the query as fast as the HTTP request required to get the data. The CouchDB view engine is also very powerful. CouchDB views use a map/reduce approach to selecting documents from the database (map), and performing aggregate calculations on that data (reduce). The reduce function is optional. CouchDB supports Javascript as the default language for the map and reduce functions. However, this is extensible, and there is support out there for writing views in several other languages.

In our case, we are planning to use CouchDB as an archive database that we can move old data to once a night. Once the data is moved to the CouchDB database, it would no longer be updated, and would only be used for calculating statistics in the application. Since we would only be moving data into the database once a day, we only need to rebuild the views once a day. Therefore, all queries could simply ask for (and get) stale data, even when the views were in the process of being rebuilt. Also, moving all of the old data out of the relational database would dramatically reduce the size of the specific tables, improving the performance of the queries that hit those tables.

I’m really looking forward to this partial migration to CouchDB. The ability to add new views to the database without affecting existing views gives us the flexibility we need to grow the TextMe application to provide better, more specific, and more relevant statistics. In marketing, statistics are king. Since TextMe is a mobile marketing tool, we want it to be able to provide all of the data that our customers are looking for, and more. I feel that by moving to CouchDB, we will not only be able to re-activate those features that we had to disable due to database performance, but also add more features and gather more statistics that would have otherwise been impossible with our previous infrastructure.

The migration to CouchDB was not always straight forward. We faced several challenges, and learned many lessons over the past month. All of those challenges will be addressed here.

In the coming posts, I plan to talk about:

  • Structuring your CouchDB databases, and the documents within them.
  • More details about CouchDB views.
  • The application code necessary to talk to CouchDB.
  • Migrating parts of an existing application from a relational database backed by ActiveRecord to CouchDB.
  • How the CouchDB security model differs from a traditional relational database.

Stay tuned!

Pages:«12

GitHub