Grouped pagination in ActiveRecord

Sometimes, designers will mock-up an extremely usable page that they’ve thought about and tested. Once they’re done and they bring it to you to implement, you look at it and think, “how on Earth can I do that efficiently?” One such case happened at work recently.

We were working on a “command center” of sorts where we show our customers a list of upgrades in various states. The design called for what looked like three different lists of records with each table consisting of records in a state. Since a customer can have potentially dozens of these records, we knew we wanted to paginate the page. The wrinkle in setup was that the pagination should consider the full list of records as the dataset, not each list individually. This caused us to wonder how we can efficiently implement grouped pagination in ActiveRecord.

Naive implementation

When presented with a mock, it’s sometimes difficult to fully understand the intended behavior. For example, take the following abstract mock. The blue represents “draft” posts, the green are “unpublished” posts, and the red are “published” posts1.

Three groups of records, grouped by status
Three groups of records, grouped by status

Upon first inspection, my pair and I saw three lists of records, each grouped by a headline that corresponds to a status. Arguably, the natural understanding of the pagination behavior on this mock is to paginate each set individually. When we translated that to controller logic, the naïve implementation looked something like this:

class PostsController < ApplicationController
  def index
    @drafts = Post.draft.page(params[:dp])
    @unpublished = Post.unpublished.page(params[:up])
    @published = Post.published.page(params[:pp])
  end
end

We questioned whether we understood the mock correctly since that seemed like it would be a challenging user experience. To make sure we understood it, we did our due diligence and reached out to the designer. It turned out that we should paginate the whole dataset, not each piece individually.

One dataset

We took the mock back for further review with our newfound understanding. A few minutes later, we sketched the following diagram:

The intended pagination scheme for the three groups
The intended pagination scheme for the three groups

By treating the list as a single dataset, it was clear that each page can have multiple groups on it. However, it wasn’t necessarily true that each page has every group on it. Some might have all three, depending on the size of the query, but others might have only two of the statuses, or one!

Complicating things even further, we knew that the order was primarily dependent on the status of the post and we should sort each set of posts (grouped by status) by title. Clearly, we should run a single query and paginate off of that, like the following mock:

The resulting set of paged records
The resulting set of paged records, as shown on each page. Notice the complications due to the ordering and grouping.

We were initially stumped. How could we do this efficiently?

First attempts

Our first attempt to do this was in the database using the GROUP BY clause in SQL. GROUP BY is great for when you need to aggregate information for groups of records. However, we needed to treat the set as a single source so that we could paginate it. That’s not possible using GROUP BY.

After that, we tried to step back from doing anything fancy and use an ORDER BY clause. However, we thought back to the requirements: we must show the draft posts first, the unpublished posts second, and the published posts third. Since those three terms alphabetically sort in the order, “draft”, “published”, “unpublished,” we knew it wasn’t a straightforward query.

Grouped pagination

We thought a little harder about the problem. If we could somehow change the “status” field to a number, we could easily order the posts by that. With this thought, we reached for the CASE conditional expression to transform the status field into something we could use. Transforming the post status and ordering them by that calculated field using a SQL fragment looks the following:

class Post < ApplicationRecord
  def self.order_by_status
    clause =
      Arel.sql <<~SQL
        CASE "posts"."status"
        WHEN 'draft' THEN 0
        WHEN 'unpublished' THEN 1
        WHEN 'published' THEN 2
        ELSE 3
        END
      SQL

    order(clause)
  end
end

I prefer to use Arel nodes for this purpose to give me better error messages if something breaks, so we refactored to this form:

class Post < ApplicationRecord
  def self.order_by_status
    order(
        Arel::Nodes::Case
          .new(arel_table[:status])
          .when('draft').then(0)
          .when('unpublished').then(1)
          .when('published').then(2)
          .else(3)
    )
  end
end

Great! We had our posts grouped and ordered appropriately with a call to .order(:title). We could then plug it into the controller for viewing2:

class PostsController < ApplicationController
  def index
    @drafts = posts.select(&:draft?)
    @unpublished = posts.select(&:unpublished?)
    @published = posts.select(&:published?)
  end

  private

  def posts
    @posts ||=
      Post
      .order_by_status
      .order(:title)
      .page(params[:page])
  end
end

This single query3 was able to do what we needed it to do: grouped pagination. It gave us a unified dataset to paginate over, grouped by status, in the correct order for each group. The confusing logic was easily isolated behind the Post.order_by_status method and we test it in isolation to make sure future Rails upgrades don’t break the syntax or logic.

  1. In the interest of not complicating this post with the intimate details of our domain, I decided to show the example using blog posts as the domain of understanding. All you need to know about that is that posts have a “title” and we store statuses as strings. ↩︎
  2. In reality, we didn’t stop here. We extracted the result into a model using `SimpleDelegator`. This let us remove the grouping logic from the controller and the view. However, that’s not germane to this post, so I’m ignoring it. ↩︎
  3. The use of `select` might have caught your eye. Since we needed to show the records on the page, it’s not a performance concern to select from the relation. Also, using `#where` has two problems. First, it removes the pagination. Second, it runs three queries instead of one. ↩︎

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.