caching | Closer to Code

Dragonfly is a great on-the-fly processing gem, especially for apps that rapidly change and need files processed by multiple processors (for example, when you need to generate many thumbnails for the same attached image).

Since it is a on-the-fly processor, it has some downsides, and the biggest one, in my opinion, is keeping track of which thumbnails we have already created. There's a nice example of how to do it using only ActiveRecord in the dragonfly docs, but unfortunately this solution won't be enough for a heavy imaged website. Since each request for a thumb will require a single SQL query, you might end up having 50-60 and even more thumb related queries per page.

Of course, they would probably get cached, so they would not take much time, but still, there's a better way to do it.

Memcached (dalli) to the rescue

Memcached uses LRU algorithm to remove old and unused data, so no need to worry about it exceeding memory limits. Instead, we can focus on putting there our thumbnails details, and the more often we request them, the higher change there will be, that we will get a cache hit. We need to remember, then Memcached is an in memory data storage, so our data might dissapear at any point. That's why it is still worth adding second ActiveRecord layer.

Configuring Rails to use Memcached as a cache storage

Before we go any further, we need to tell our app, that we want to store cached data in Memcached. To do so, please follow Dalli docs instructions. Dalli is the best Memcached ruby client there is and it can be easily integrated with Ruby on Rails framework.

Creating ActiveRecord dragonfly cache storage

Dragonfly docs example names the ActiveRecord cache model Thumb. We will name it DragonflyCache, since we might use it to store info not only about images. Here's an sample migration for this resource:

class CreateDragonFlyCache < ActiveRecord::Migration
  def change
    create_table :dragonfly_caches do |t|
      t.string :uid
      t.string :job
      t.timestamps
    end

    add_index :dragonfly_caches, :uid
    add_index :dragonfly_caches, :job
  end
end

Adding Memcached layer to a DragonflyCache model

We now have a single layer ActiveRecord cache storage for our app. We will use two Rails cache methods to add a second layer:

Rails.cache.read(key)
Rails.cache.write(key, value)

Cache read (hit)

The whole flow for cache read will be pretty simple:

Check if job signature is in Memcached and if so - return it (do nothing else)
If not, check if it can be found in SQL database and if so, store it in Memcached and return it
If not found, return nil

# @return [::DragonflyCache] df cache instance that has a uid that we want
#   to get and use
# @param [String] job_signature that acts as a unique identifier for uid
def read(job_signature)
  stored = Rails.cache.read(key(job_signature))

  return new(uid: stored) if stored

  stored = find_by(job: job_signature)
  Rails.cache.write(key(job_signature), stored.uid) if stored

  stored
end

You might notice a method called key. When working with Memcached, I like to prefix keys, so there won't be any cache collisions (scenario when different data is there).

This method is pretty simple:

# @return [String] key made from job signature and a prefix
# @param [String] job_signature for which we want to get a key
def key(job_signature)
  "#{PREFIX}_#{job_signature}"
end

Cache write (store)

Cache write will be easier. We just have to store job details in both SQL database and Memcached:

# Stores a given job signature with uid in a DB and memcache it
# It is used as a fallback persistent storage, when memcached key
# is not found
# @param [String] job_signature that acts as a unique identifier
# @param [String] uid that is equal to our file path under which
#   we have this certain thumb/file
# @raise [ActiveRecord::RecordInvalid] raised when something goes
#   really wrong (should not happen)
def write(job_signature, uid)
  Rails.cache.write(key(job_signature), uid)

  create!(
    uid: uid,
    job: job_signature
  )
end

Now, we can incorporate the code above into DragonflyCache model.

Full DragonflyCache model

# DragonflyCache stores informations about processed dragonfly files that we have
# @see http://markevans.github.io/dragonfly/cache/
class DragonflyCache < ActiveRecord::Base
  # Prefix for all memcached dragonfly job signatures
  PREFIX = 'dragonfly'

  class << self
    # @return [::DragonflyCache] df cache instance that has a uid that we want
    #   to get and use
    # @param [String] job_signature that acts as a unique identifier for uid
    def read(job_signature)
      stored = Rails.cache.read(key(job_signature))

      return new(uid: stored) if stored

      stored = find_by(job: job_signature)
      Rails.cache.write(key(job_signature), stored.uid) if stored

      stored
    end

    # Stores a given job signature with uid in a DB and memcached
    # It is used as a fallback persistent storage, when memcached key
    # is not found
    # @param [String] job_signature that acts as a unique identifier
    # @param [String] uid that is equal to our file path under which
    #   we have this certain thumb/file
    # @raise [ActiveRecord::RecordInvalid] raised when something goes
    #   really wrong (should not happen)
    def write(job_signature, uid)
      Rails.cache.write(key(job_signature), uid)

      create!(
        uid: uid,
        job: job_signature
      )
    end

    private

    # @return [String] key made from job signature and a prefix
    # @param [String] job_signature for which we want to get a key
    def key(job_signature)
      "#{PREFIX}_#{job_signature}"
    end
  end
end

Connecting our DragonflyCache model with Dragonfly

This part is really easy. We have to add the following config options to our Dragonfly initializer:

Dragonfly.app.configure do
  define_url do |app, job, opts|
    cached = DragonflyCache.read(job.signature)

    if cached
      app.datastore.url_for(cached.uid)
    else
      app.server.url_for(job)
    end
  end

  before_serve do |job, env|
    DragonflyCache.write(job.signature, job.store)
  end
end

That's all. Now your Dragonfly cache should hit DB less often.

After successful Capistrano update (cap deploy) if we use Memcached - we should clear it. How can we do it without root access to the server? There are two ways to clear memcached without restarting it:

Memcached command: flush_all
Rails.cache.clear

One flush to rule them all - flush_all

If we have a dedicated server with one application on it - we can clear whole memcached memory, to do so, we create a rake task (/lib/tasks/memcached.rake):

require 'socket'

namespace :memcached do
  desc 'Flushes whole memcached local instance'
  task :flush do
    server  = '127.0.0.1'
    port    = 11211
    command = "flush_all\r\n"

    socket = TCPSocket.new(server, port)
    socket.write(command)
    result = socket.recv(2)

    if result != 'OK'
      STDERR.puts "Error flushing memcached: #{result}"
    end

    socket.close
  end
end

Usage:

bundle exec rake memcached:flush

It is worth mentioning, that this task doesn't require the Rails environment to be loaded. If it goes about the server address and port - you can always modify it so it will accept the env settings instead of hardcoding it.

Be aware, that this command will clear out all the data that is stored in Memcached instance, even the data that was used by other applications (other than our). If you want to clear out data used by one of many apps that are using same Memcached server, see the solution presented below.

Clearing single Rails app memcached data - Rails.cache.clear

Apart from flushing all the data that is in Memcached, we can always clear only the Rails cache by creating a really simple rake task (/lib/tasks/memcached.rake):

namespace :memcached do
  desc 'Clears the Rails cache'
  task :flush => :environment do
    Rails.cache.clear
  end
end

The execution process is exactly like in the previous case:

bundle exec rake memcached:flush

In this Rake task we do load the Rails environment (because we want to use Rails.cache instance). In multi application environment, this Memcached cleaning method seems way better because we work with our application scope only.

Capistrano task for clearing Memcached

So we have our rake task, but it would mean nothing without a Capistrano hookup:

namespace :memcached do

  desc "Flushes memcached local instance"
  task :flush, :roles => [:app] do
    run("cd #{current_path} && rake memcached:flush")
  end

end

Now we can use it like this:

bundle exec cap memcached:flush

Or we can hookup it to update process:

after 'deploy:update' do
  memcached.flush
end

Tag: caching

Ruby on Rails and Dragonfly processed elements tracking with two layer cache – Memcached na ActiveRecord underneath