Memoization in Ruby using three different patterns

  • about 7 minutes

Memoization is the process of performing an expensive calculation — a lookup from a database, a long-running function, etc. — and then caching its value to prevent the expensive action from running again. In computer science jargon, it trades space complexity for time complexity; that is, instead of releasing the calculated value for the garbage collector, it stores the value, thus increasing the space requirements of the object, to save on the time it takes to calculate the value more than once.

There are myriad ways to perform memoization in Ruby. The most common means is to use the ||= operator to conditionally set an instance variable upon first running the method. In most cases, this works well. However, there are certain cases where it does not do what you expect it to. This article discusses the semantics and gotchas of memoization in Ruby and shares different patterns you can use to ensure the correct behavior happens every time you choose to memoize a method.

Hello, ||=, my old friend

One of the first things you see when source-diving or working in Ruby on a team is the use of the ||= operator to save method results as instance variables. Using a simple Rails controller as an example, this pattern looks like the following:

class ApplicationController < ActionController::Base
  def current_user
    @_current_user ||= User.find(session[:user_id])
  end
end

Once you have seen this pattern and understood it at the surface level, it’s easy to apply. The algorithm looks something like this:

  1. Create a method that performs some expensive calculation.
  2. When you see that you use the method more than once, apply @<method_name> ||= on the left-hand side of the method’s body.
  3. Move on to your next task.

But what is really happening here? Let’s break down the semantics of the operator. Were we to expand that line, it would look like the following1:

class ApplicationController < ActionController::Base
  def current_user
    @_current_user || @_current_user = User.find_by(id: session[:user_id])
  end
end

After seeing the expanded form written out, we can step through its semantics. Due to operator precedence and the everything-is-an-expression nature of Ruby (e.g. there is an implicit return on the last statement of a block or method), there are two steps to this algorithm:

  1. If @_current_user is truthy, return the value from the method.
  2. Otherwise, set @_current_user to the value returned by User.find_by(id: session[:user_id) and return the value from the method.

Simple enough, right? Use the value if it’s there, otherwise calculate it. That accomplishes our goal for much of the problem space, but writing out the algorithm in English helps to illustrate that it doesn’t fully memoize the answer for all types of work that we might do.

When is ||= not enough?

Think about the English algorithm again. Do you see the problem? To illustrate the point, it will help to veer into Boolean algebra for a moment.

In traditional Boolean algebra, there are two values: true and false. Computers fully run on Boolean algebra as all binary coding and logic run on this branch of algebra. That’s all well and good for the electrons working their way through the silicon of your computer processor, but we are programmers, not electrical engineers.

Computer programming is the practice of telling a computer what to do. However, it’s not only that. It is also the practice of codifying your understanding of a problem such that other humans can tell what the computer will do. Because the designers of the languages that we program mean for your code to be understandable by both machine and human2, we often take shortcuts to reduce the amount of mental clutter in our programs. One such shortcut is the introduction of the concepts of truthiness and falsiness, with their itinerant adjectives truthy and falsy.

When we say something is truthy, we mean that when we pass the value to an if statement, the program will take the success branch rather than the failure one. The opposite is true of falsy values. In Ruby, everything is truthy except for an instance of NilClass (i.e. the value nil) or an instance of FalseClass (i.e. the value false). This is one of the reasons why Rails introduced the #present? and #blank? methods and why you see them littered throughout the code base of Rails applications.

Now that we know what we mean when we say truthy, the problem becomes clear: in our method, when we set @_current_user to something falsy, the method will run the calculation again … and again and again, if you call it repeatedly. With that knowledge, how can we ameliorate this problem?

A simple amelioration

The semantics of variables in programming languages is a topic that is hotly debated in language design circles. When should a variable be considered in scope? When is it defined? Do you hoist the definition to the top of the scope or not consider it until the line that declares the variable? Exciting stuff, for those who like that type of thing.

In Ruby, we define a variable — whether it’s a class, instance, or local variable — by putting its name on the left-hand side of an assignment operator3 for the first time. Once we define a variable, we gain access to the keyword-cum-method #defined? for checking whether we defined it previously … even if it is falsy.

Now, with knowledge of #defined?, we can design a new memoization strategy that helps protect our nilable User.find_by method:

class ApplicationController < ActionController::Base
  def current_user
    return @_current_user if defined? @_current_user

    @_current_user = User.find_by(id: session[:user_id])
  end
end

The first time you call the method, the instance variable @_current_user will not be defined so the method body runs, setting @_current_user to the User or nil. The next time you call it, @_current_user is defined so it returns via the guard clause. Peachy!

A gotcha in Rails controllers

Since Rails controllers share their instance variables and methods with the view context, you must exercise your discipline with helper methods and instance variables. Don’t access your instance variables from your view; declare them as helper methods in your controllers. This is the reason why I prefix my instance variables with an underscore in my Rails examples. It helps to communicate that we should not access the instance variable and instead use a defined helper method.

The danger is that when you access the instance variable from the view, there is a chance that you might inadvertently define it. If by accident, you define the instance variable backing your #current_user method in the view, you might break the method for every other caller.

As such, I recommend instituting a policy in your application of not accessing instance variables in views, whether they are partials or not.

Memoizing methods with parameters

Sometimes, we write methods that take parameters and perform expensive actions based on the parameters sent to the method. In those cases, we often still want to make the same trade-off in space for time that we do for singular variables. How can we cache the results of a parameterized method?

Writing out our meaning in English can, again, help:

  1. If we have called the method #expensive with the argument "foo" before, return the value that we cached last time.
  2. Otherwise, calculate the value and cache the result.

Single-parameter methods are the easiest here because the argument is easy to visualize as a cache key. When you want to cache something, you want a fast lookup (ideally O(1)) from a list of values. Expressed in that manner, is there a data structure that comes to mind?

Consider the following:

def expensive(arg)
  return @_expensive_memo[arg] if defined? @_expensive_memo

  @_expensive_memo ||= Hash.new do |memo, key|
    memo[key] = do_expensive(arg)
  end

  @_expensive_memo[arg]
end

Hashes have O(1) lookup cost, so they make a natural fit for the cache storage. Once we make that decision, we can leap to this implementation. Let’s break it down:

  1. The guard clause runs fetches from the cache and uses the fall-through when the value isn’t there, as long as the cache was previously defined.
  2. The storage takes the form of a Hash with a #default_proc. The block takes two arguments: the first is the hash itself and the second is the key that you’re trying to access. By setting the value of the key within the storage hash inside the #default_proc, we ensure that a call calculates the value and caches it.
  3. The call to @_expensive_memo[arg] ensures that we get the calculated value from the first time we run the #expensive method instead of the cache store.

This pattern works as long as arg has a stable #hash method. This is true for built-ins like String and the Numeric classes, ActiveRecord objects, and Structs, but is not true by default for plain old Ruby objects that you implement yourself, so keep that in mind if you need to pass rich objects to your memoized method.

Extending for multiple arguments

What if you have multiple arguments that you need to pass to the method? You can pass an Array of objects as a Hash key, so it’s a small tweak to get that to work:

def expensive(arg1, arg2)
  @_expensive_memo ||= Hash.new do |memo, (inner_arg1, inner_arg2)|
    memo[[inner_arg1, inner_arg2]] = do_expensive(inner_arg1, inner_arg2)
  end

  @_expensive_memo[[arg1, arg2]]
end

In this example, we are destructuring the cache key into its constituent parts and then recombining them back into an array when setting the key. You could also use the * or ** operators if you choose to break out the calculation part into its own method; it depends on how communicative you want the signature of the memoized method to be. That could look like the following:

def expensive(*args)
  @_expensive_memo ||= Hash.new do |memo, inner_args|
    memo[inner_args] = do_expensive(*inner_args)
  end

  @_expensive_memo[args]
end

Alternatives to memoization

The advantage of memoization is that it allows your objects to be lazy and not initialize their state until you need it. But sometimes you have state that you will always need to initialize, even though it’s expensive. In those cases, consider setting the state as part of the initializer and using a simple attr_reader to read the state. This is Ruby, after all; if there’s one way to do it, there are a dozen other ways as well.

Conclusion

When you reach for memoization in Ruby, don’t blindly fall into the habit of @my_var ||= "something expensive". First, think about whether your calculation can return a falsy result. Particularly when you’re fetching single models from ActiveRecord or calculating a Boolean predicate, you have to consider the values returned by the method. If there’s the possibility of a falsy result, then adjust your method to use the check for definition. And lastly, when you need to memoize multiple values from one method, reach for a Hash to store those results.

Memoization can be a powerful tool in your toolbelt, but like all tools, pay attention to overuse. Also, consider whether the lazy initialization buys you anything in the context of what you’re currently working on. Perhaps it would be better to instantiate the state up front instead.

Has erroneous memoization ever bit you? What’s the worst example of a mistake you’ve made using memoization? Do you have any particularly great wins from applying the pattern?


  1. Using RubyVM::InstructionSequence.compile().disasm shows that this isn’t quite correct … but it is close enough to illustrate the point. ↩︎

  2. We mean languages to be understandable with some exceptions. Malbolge comes to mind. ↩︎

  3. Or a variety of other ways, including instance_variable_set, but that is beyond the scope of this article. ↩︎