hash | Closer to Code

Few weeks ago a friend asked me, why this Ruby example acts so strangely:

hash = Hash.new([])
puts hash #=> {}
hash['foo'] << 1 << 2 << 3
puts hash['foo'] #=> [1, 2, 3]
puts hash #=> {}
hash.delete('foo') #=> nil
puts hash['foo'] #=> [1, 2, 3]

You may ask, why a hash that clearly has some values in a 'foo' key is empty when we print it? Furthermore, why once we delete this key, the values are still present?

Everything goes down to the ::new method and the way Hash deals with the default value. Most of the programmers that I know were assuming, that when they pass an empty array to a hash initializer, each key without a value will be initialized with an empty array:

hash = Hash.new([])
puts hash #=> {}
puts hash['foo'] #=> []
puts hash['bar'] #=> []
puts hash #=> { 'foo' => [], 'bar' => [] }

However Ruby does not work like that. Under the hood, when ::new method is invoked, Ruby stores the default value inside newly created hash instance and uses it each time we request a value of a key that is not present. Internal implementation of this fetching method looks similar to this (in terms of how it works):

def fetch(key)
  instance_variable_get("@_#{key}") || @_defaults
end

It means that when you provide a default object, it will always be one and the same object. Ok. But it does not explain why when we print this array, it appears to be empty! Well... it does. What we were doing up until now in our examples was modifying the internal structure of a default array. This is the reason why Ruby thinks, that there's nothing new in the array. In fact, there is nothing new and from Ruby perspective, the array is empty. We were reusing the default value all the times.

If you decide to use a Hash default value that is other than nil and you don't understand this concept, you might get into trouble. That's why it is a really good practice to initialize non-nil hashes with a block:

hash = Hash.new { |hash, key| hash[key] = [] }
puts hash #=> {}
hash['foo'] << 1 << 2 << 3
puts hash['foo'] #=> [1, 2, 3]
puts hash #=> { 'foo' => [1, 2, 3] }
hash.delete('foo') #=> [1, 2, 3]
puts hash['foo'] #=> []
puts hash #=> { 'foo' => [] }

require 'benchmark' GC.disable ar = nil 100.times do |steps| d = Benchmark.measure { 100000.times { ar = { a: 1, b: 2, c: 3, d: 4, e: 5, f: 6, g: 7, h: 8, i: 9, j: 10 } } } system("echo '#{ar.count}, #{d.real}' >> #{ar.count}.csv") end

Tag: hash

Ruby: Hash default value – be cautious when you use it

Ruby hash initializing – why do you think you have a hash, but you have an array

Benchmark

Benchmark results