The absurdity of Ruby’s Hash default value

  • about 3 minutes
A chalkboard with a question mark drawn on it

The Hash default value behavior in Ruby is one that can leave people confused. The hallmark of a good API is that it is obvious what is going to happen in each way you use it. This is one of the few places I can think of where Ruby’s standard library has surprising and confusing behavior. In the case where you use a value object — like a Numeric — as the default value, everything works as you would easily expect.

However, most objects in Ruby are not value objects. They are mutable, which leads to seemingly strange behavior when you accidentally mutate the memoized default. Combined with Hash extensions, like Hashie, you can accidentally ruin your understanding of Hash and end up in a confusing spot.

Oh, Hashie!

I’ve been one of two core maintainers for Hashie since 2014. Hashie, if you don’t know, is a library of Hash extensions and subclasses that afford you different behavior for convenience. Enough Ruby gems use it as a transitive dependency that we can accidentally break many projects when we ship a bug.

Because Hashie is so pervasive, developers use our extensions enough that they often end up overriding the built-in behavior of Hash in the minds of our users. As an example, let’s look at the behavior of Mash, our core feature and the source of much of our maintenance burden.

skipper = Mash.new(
  name: "Skipper",
  race: "adélie penguin",
  temperment: "volatile"
)
skipper.name        #=> "Skipper"
skipper.race        #=> "adélie penguin"
skipper.temperment  #=> "volatile"

Mash has what we call a “merge initializer” that merges a given hash into itself. You can extend this behavior on a normal Hash with the MergeInitializer extension. At face value, this feels right, but it flies directly in the face of the standard library. Why?

Hash default value example

Let’s look at the default value example from Ruby’s documentation. In it, we see this:

h = Hash.new("Go Fish")
h["a"] = 100
h["b"] = 200
h["a"]           #=> 100
h["c"]           #=> "Go Fish"
# The following alters the single default object
h["c"].upcase!   #=> "GO FISH"
h["d"]           #=> "GO FISH"

The default value, in this case, is the string "Go Fish". When you access a key that hasn’t been set, you get back this default value. But the default value is memoized inside of the Hash instance; it isn’t recreated each time since you can set any arbitrary object as a default. That leads to the confusing behavior where, if you call a mutating method like String#upcase! the value returned by an unset key, you can accidentally mutate the default value.

With this example in mind, let’s look at a case where this behavior makes perfect sense.

page_views = Hash.new(0)
page_views["/my-blog-post"] += 1
page_views["/my-blog-post"] += 1
page_views["/my-blog-post"] += 1
page_views  #=> {"/my-blog-post" => 3}

Because the integer zero is a value object, we don’t have the same worry. Value objects are objects that represent a particular value and are immutable. When you need to change them, they return a new object with the change and do not update their state.

An easy one-liner to remember is: only use value objects as a Hash default value.

Default proc to the rescue

In the cases where you want to have a mutable value as the default, you can rely on the alternative interface for defaults: default procs. Let’s look at an example.

h = Hash.new { |hash, key| hash[key] = "Go Fish" }
h["a"] = 100
h["b"] = 200
h["a"]           #=> 100
h["c"]           #=> "Go Fish"
h["c"].upcase!   #=> "GO FISH"
h["d"]           #=> "Go Fish"
h
#=> {"a" => 100, "b" => 200, "c" => "GO FISH", "d" => "Go Fish"}

The default proc acts how we intuitively expect the default value to work, when it comes to mutable objects. When we mutate a value set by the default value, it is confusing that later accesses return the modified result. I think this is the expected behavior in all cases and, in the case of using a Numeric as a default, the Hash default value only works by coincidence.

An alternative interface?

If we could go back in time, it would be nice to make the default value a #callable. Since this behavior goes back so far, we didn’t have many of the niceties that we have now; but in modern Ruby, the interface could look like this:

h = Hash.new(-> { "Go Fish" })
h["a"] = 100
h["b"] = 200
h["a"]           #=> 100
h["c"]           #=> "Go Fish"
h["c"].upcase!   #=> "GO FISH"
h["d"]           #=> "Go Fish"
h
#=> {"a" => 100, "b" => 200, "c" => "GO FISH", "d" => "Go Fish"}

Using this simple interface would ease some of the misunderstanding that we have around Hash default values. Sadly, it wouldn’t be easy to make this transition, since it would break backward-compatibility. I don’t think we’ll ever be able to make this change, so we will have to rely on Hash#default_proc instead.

What do you think of this behavior? Is it intuitive?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.