Solving the right problem

A label in a book that reads "mistake."

Recently, I received a bug report on one of my open source projects. The bug report came from my co-maintainer on Hashie, dB, who I hold as an exemplary asker of questions. I quickly read through the issue and the background information on his issue and quickly sent him a reply that his suggestion sounded reasonable and that I would accept a contribution adding that behavior. I was attempting to be more like him and act as a facilitator, which is an attribute that I admire in him. However, I did a poor job of this task.

Notice what I said earlier: “I quickly read through the question and background information.” Any time you find yourself quickly doing something a non-rote task, alarm bells should go off in your head. When it comes to complex tasks or understanding, quick is the enemy of right.

Quick begets shortcuts. Shortcuts in understanding. Shortcuts in listening. Shortcuts in thinking.

Quick begets guesses. Guesses about meaning. Guesses about intent.

And quick begets mistakes.

This is a story about my mistake.

The report

The original report shares a confusing situation from the Grape framework. To debug a memory leak, a Grape programmer wrote a benchmark using my gem. The benchmark is not easy to understand. It tests switching from an Array to a Set as the backend to store some inheritable properties for an API endpoint.

Here is the benchmark they wrote, along with the output from running it, as written by James Lamont:

$LOAD_PATH.unshift(File.join(File.dirname(__FILE__), '..', 'lib'))
require 'grape'
require 'benchmark/memory'

class VotingApi < Grape::API
  logger Logger.new(STDOUT)

  helpers do
    def logger
      VotingApi.logger
    end
  end

  namespace 'votes' do
    get do
      logger
    end
  end
end

class PostApi < Grape::API
  mount VotingApi
end

class CommentAPI < Grape::API
  mount VotingApi
end

env = Rack::MockRequest.env_for('/votes', method: 'GET')

PostApi.call(env)
CommentAPI.call(env)

setup_ar = VotingApi.instance_variable_get(:@setup).to_a
setup_set = VotingApi.instance_variable_get(:@setup).to_set

Benchmark.memory do |api|
  calls = 1000

  api.report('using Array') do
    VotingApi.instance_variable_set(:@setup, setup_ar)
    calls.times { PostApi.call(env) }
  end

  api.report('using Set') do
    VotingApi.instance_variable_set(:@setup, setup_set)
    calls.times { PostApi.call(env) }
  end

  api.compare!
end
Calculating -------------------------------------
         using Array    21.261M memsize (   274.472k retained)
                       178.001k objects (     2.012k retained)
                        18.000  strings (     0.000  retained)
           using Set    21.261M memsize (     2.472k retained)
                       178.001k objects (    12.000  retained)
                        18.000  strings (     0.000  retained)

Comparison:
         using Array:   21261040 allocated
           using Set:   21261040 allocated - same

How I heard it and my mistake

When quickly reading through this bug report and reading the benchmark, this certainly sounds like a bug.

The output here shows that both approaches use the same ~21MiB of memory. This intuitively makes no sense since the set should prevent duplicates.

The question that came from the bug report was, in essence, “should we run GC.start after the task runs?” After a quick read, of course, we should run the garbage collector! Otherwise, how would we know what items survive through a collection?

And there — right there in that last sentence — is my mistake. I answered with a quick, “Yeah, that makes sense. Go ahead and send a change!”

Surprise! My quick response was wrong; it doesn’t make sense. The memory profiler gem already does that. In fact, it runs multiple collections to encourage the garbage collector to fully clean unlinked memory. Without doing so, it wouldn’t be able to report the statistics that it does.

So we now know my mistake, but what was the correct thing to do?

Reading it again

To find the correct path, let’s summarize the situation.

  1. Grape stores some configuration that appears to cause a memory leak.
  2. James wrote a benchmark that showed the memory leak using my gem.
  3. The output was confusing so dB filed an issue with my gem.
  4. I read the benchmark and the explanation for the confusion.
  5. The confusion centered around someone using the benchmark with the comparison to show the presence of a memory leak.
  6. Memory leaks have to do with retaining objects, not necessarily with allocating them, and the comparison is about allocations.

After rereading and summarizing, I saw my mistake and decided upon two directions to approach the problem.

A better solution?

While there isn’t a bug in the behavior of the gem, there is a bug in the experience of using the gem. If I created the gem with the intent of using it to diagnose memory leaks, perhaps I would have nailed the experience of doing that with its DSL. However, I designed the gem to measure the space complexity of algorithms. That is a similar, yet distinct, job. Thus, I didn’t design the interface for that purpose and, when used for it, the gem confused the benchmarker.

Making a quick decision here, I could take the path of redesigning the DSL to make it so you can easily do the job of tracking memory leaks. This needn’t be a complicated change. And it can be backward-compatible as well. This is a task that I know how to do and I’m confident in doing it.

But is it what I should do first?

The right choice

Even the summarized version of events is complex. There is more than one conversation that I was not a part of. There are multiple stakeholders in the problem. And it’s unclear if I missed or misunderstood anything in any of the conversations.

Thus, the right choice here is to ask whether I have it correct. The only way to know whether I understand the problem correctly is to state how I understand it and check that for correctness. Only once I have done that and know that I understand the job to be done can I be sure that I am solving the right problem.

So, I did that and am currently waiting on an answer. The great thing now is that, even if my understanding isn’t complete, I now know that people use my gem for a purpose other than the one I designed it for: proving memory leaks. Since I know that is something that people need, I can spend some time thinking about how to make that process better. I can even take my time instead of being quick about it.

What to do next time

Maintaining my open source libraries is something that I do in my free time. I do it for enjoyment and to improve my skills. Free time, however, is a precious commodity at this point in my life. This makes me want to handle things quickly when I’m going through the issues on my repositories.

Next time, I will think about this experience and the mistake that I made. Most bug reports are technical and not easy to quickly understand. I need a process that allows me to feel like I am quickly handling issues even when I’m taking my time. These articles are one way of doing that. Another would be to write a development diary about the task so that I can reference it later.

I think the process that I outlined in this article is a reasonable approach for next time. When I feel myself doing something quickly on an open source project, I should write down my understanding of the problem and then check it with the person who filed the issue. Only once I have done that should I move forward with a change.

Save time by spending it wisely. It’s a finite resource, after all.

What kinds of mistakes have you made because you were acting quickly? Have you created a process to catch yourself when you’re doing this?