Paginating Records in CouchDB via CouchRest

Update: This change has been incorporated into CouchRest version 0.30

When I began looking into replacing some of TextMe‘s large MySQL tables with CouchDB databases, one of the things I noticed right away was that pagination support was not quite there in CouchRest. I say “not quite there” because CouchRest does have the ability to fetch data from the database in paginated chunks, but the current support didn’t really fit too well with way the rest of the library interacts with CouchDB views. A helper class had to be used to fetch the data, and the data came back as hash instead of an instance of the appropriate class.

Pagination is a must for us, because these tables in particular are very large. That’s one of the main reasons why we’re moving them to CouchDB in the first place. Loading all of the data into memory at once would be troublesome to say the least.

CouchRest is still a very young library, currently on version 0.29. However, despite its age, it is already fully featured and off to a great start. So, I saw this as an opportunity to contribute to something that we have already greatly benefited from.

With a little inspiration from Rails, I decided to implement a proxy that would be created when a view was called to fetch data. The proxy would defer getting data from the database until that data was actually needed. I then implemented will_paginate style paginate and paginated_each methods on the proxy object. If either of these methods are called, only a chunk of data will be fetched from the database, and that data will be returned as an array of instances of the appropriate class. If any other method is called on the proxy, the proxy will fetch all of the data from the view, and forward the call on to the “real” array.

I decided to go with will_paginate style methods because the will_paginate gem is by far the most popular pagination solution for Rails. We use it extensively in TextMe. So, implementing the same methods would ensure that we could continue to use our existing pagination code, and the code wouldn’t have to know if it was dealing with a collection of ActiveRecord objects or a collection of CouchRest ExtendedDocument objects.

The new code also throws some methods onto the class itself that lets you paginate over instances of the class without having an instance of the proxy, or a view in your CouchRest ExtendedDocument object.

Here are some examples, pulled from the CouchRest tests:

Paginating using instance methods:

articles = Article.by_date :key => Date.today
articles.paginate(:page => 1, :per_page => 3).size.should == 3

articles = Article.by_date :key => Date.today
articles.paginated_each(:per_page => 3) do |a|
  a.should_not be_nil
end

Paginating via class methods:

articles = Article.paginate(:design_doc => 'Article', 
  :view_name => 'by_date', :per_page => 3, :descending => true, 
  :key => Date.today, :include_docs => true)
articles.size.should == 3

options = { :design_doc => 'Article', :view_name => 'by_date',
  :per_page => 3, :page => 1, :descending => true, 
  :key => Date.today, :include_docs => true }
Article.paginated_each(options) do |a|
  a.should_not be_nil
end 

Currently, the forked version of CouchRest containing this feature can be found on GitHub, at http://github.com/jwood/couchrest/tree/master. I’ve submitted a request to have this pulled into the main CouchRest repository.

Hopefully this will be helpful to others.

Be Sociable, Share!

    4 thoughts on “Paginating Records in CouchDB via CouchRest

    1. Hey John,

      { :limit => per_page, :skip => per_page * (page – 1) }

      if I see correctly, you are using count and skip to paginate. This will not work with large tables in an efficient manner. You need to paginate over the view index. http://wiki.apache.org/couchdb/How_to_page_through_results explains how to do it properly. The upshot is that skip will need to scan through all the result rows that it skips. If you use the startkey parameter, it can use the b-tree structure that underlies the view result and get to the “page” *way* more efficiently.

      Cheers
      Jan

    2. One caveat of the startkey approach: it would be much harder to do Digg-style pagination, where you present links to pages 2,3,4,5 etc on page 1.

      For example, if you wanted to put a link to page 5 on page 1, you’d need to know the lastkey on page 4 — you’d have to run a query to determine that.

      Really, with the startkey approach, page numbers go away — you’d have pages that were “numbered” by the document id that began the page, and the number of possible pages would be the number of documents in the total result set.

    3. I was thinking about this last night, I think I will be able to support both approaches.

      I believe paginated_each (the class and instance methods) could be converted to use the approach that @Jan suggests, as you’re just churning through the results, fetching them in batches. This is where the inefficiency @Jan points out would be really troublesome.

      As for the paginate methods, we may be stuck with the current implementation for the class method. However, we may be able to tweak the instance method to hold on to the last key it fetched as an instance variable in the proxy. Then, if somebody fetches the next page in order, we can pick up from where we left off, using that key.

      I’m going to play around with it and see where I can take advantage of the more efficient approach.

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>