reentrancy | Closer to Code

Few months ago I've created a post about reentrancy: Ruby (Rails, Sinatra) background processing – Reentrancy for your workers is a must be!.

In this post, I will present a nice way to implement such feature for your Sidekiq workers.

Simple reentrancy

Normally a Sidekiq worker looks similar to this one:

class ExampleWorker
  include Sidekiq::Worker

  def perform(*args)
    # Background logic here
  end
end

and if something goes wrong, you should see it in your Sidekiq log (or a bugtracker like Errbit). However there is no reentrancy there. You could catch exceptions and handle reentrancy with this code:

class ExampleWorker
  include Sidekiq::Worker

  def perform(*args)
    # Background logic here
  rescue => exception
    # Do something on failure before reraising
    raise exception
  end
end

but it is not too elegant and if you have multiple Sidekiq workers, than probably you will end-up with a lot of code duplication.

Making your reentrancy code more fancy

Instead of handling reentrancy in every single worker, you could just create a base worker class, that would provide such functionality for all the workers that would inherit from the base one:

class BaseWorker
  include Sidekiq::Worker

  def perform(*args)
    # you need to implement the execute method
    # execute method should contain code you want to execute in the background
    execute(*args)
  rescue => exception
    after_failure(*args) if respond_to?(:after_failure)
    raise exception
  end
end

Now, instead of implementing a perform method in every worker, you need to name it (or rename) execute. Perform method will act as a wrapper that will try to execute your worker code and if it fails, will run the after_failure method (if it exists).

Note that the error will be reraised, but now we have a fallback to do for example some database status changes.

class KeywordsWorker < BaseWorker
  def execute(keyword_name)
    KeywordsService.new.remotely(keyword_name)
  end

  # Bring to an expire state if something goes wrong
  def after_failure(keyword_name)
    KeywordsService.new.expire(keyword_name)
  end
end

Of course we might have workers, that won't require reentrancy at all. Then we just skip the after_failure method and thanks to the respond_to? method, everything will work normally.

Background processing is a must-be in any bigger project. There's no way to process everything in a lifetime of a single request. It's such common, that I venture to say that each and every one of us made at least one background worker. Today I would like to tell you a bit about reentrancy.

What is reentrancy

I will just quote Wikipedia, since their description is really nice:

In computing, a computer program or subroutine is called reentrant if it can be interrupted in the middle of its execution and then safely called again ("re-entered") before its previous invocations complete execution. The interruption could be caused by an internal action such as a jump or call, or by an external action such as a hardware interrupt or signal. Once the re-entered invocation completes, the previous invocations will resume correct execution.

So basically it means, that if our worker crashes, we can execute him again and everything will be just fine. No database states fixing, no cleanups, no resetting - nothing. Just re-executing worker task.

How many workers do you have like this? ;) I must admit: I've created non-reentrant workers many times and many times I wish I didn't.

Why our workers should be reentrant and why it's not an overhead to make them like this

Making workers reentrant, especially at the beginning will take you more time than creating a "standard" one. "This will create an overhead" you might think. This might be true but... it's not. If you have many workers that are constantly doing something, and some of them crash, reentrancy will save you a lot of time. It allows you to just fix the issue and rerun tasks again, without having to worry about anything else. Without it, you would spend some time fixing database structure, cleaning things up, performing requests to external APIs from production console and other non-programming related stuff. I guarantee that you will waste much more time doing this, than writing your workers well in a first place.

How to make my workers reentrant?

It's not so hard as you might thing, although sometimes it can be a bit tricky. Of course every worker is somehow unique but there are some general rules that you can use.

Transactional, non-API example test case

The easiest stuff is with non-API, transaction-only workers, that calculate and update some data:

class ScoreWorker
  include Sidekiq::Worker

  def perform(user_id)
    user = User.find(user_id)
    user.update(status: 'calculating')
    # This can take a while...
    user.calculate_score!
    user.update(status: 'calculated')
  end
end

If something would happen when calculate_score! is executed, we would end up with a user with endless "calculating" state. The easiest way to fix this, is to use ActiveRecord::Base.transaction block:

class ScoreWorker
  include Sidekiq::Worker

  def perform(user_id)
    ActiveRecord::Base.transaction do
      user = User.find(user_id)
      user.update(status: 'calculating')
      # This can take a while...
      user.calculate_score!
      user.update(status: 'calculated')
    end
  end
end

If anything goes wrong, we will get back to where we started. Unfortunately this approach has one huge disadvantage: user status is changed in transaction, so until it is committed, we won't have the 'calculating' status (at least if you don't have dirty reads).

Non-transactional, non-API example test case

Approach presented below can be also used to improve previous example. Let's imagine we don't have transactional DB and that every operation is performed separately. We need to catch an exception, rewind everything back and then just reraise error:

class ScoreWorker
  include Sidekiq::Worker

  def perform(user_id)
    user = User.find(user_id)
    # State machine is always nice :)
    user.calculating_score!
    # This can take a while...
    user.calculate_score!
    user.calculated_score!
  rescue e
    # Reset everything so it can be processed again later
    # We "if" in case error was raised in the first line
    user.reset_score! if user
    raise e
  end
end

Of course this will not prevent us from DB failures, but in my experience, workers tend to fail mostly not because of database issues but because of some problems in the app (worker) logic.

Non-transactional, API example test case

What about external API interactions? When we change something remote, we cannot just simply "unchange" stuff. Let's say that we have a charging mechanism, that makes a call (charge) and then it sends an invoice to this user:

class PaymentWorker
  include Sidekiq::Worker

  def perform(user_id)
    user = User.find(user_id)
    Payment::Gateway.charge!(user)
    user.send_invoice_confirmation!
  end
end

How can we provide reentrancy for worker like that? What will happen when user.send_invoice_confirmation! fails? We cannot charge user again for the same month. This means, that we cannot execute this worker task again. We might check whether or not user has been charged:

class PaymentWorker
  include Sidekiq::Worker

  def perform(user_id)
    user = User.find(user_id)
    Payment::Gateway.charge!(user) unless user.charged?
    user.send_invoice_confirmation!
  end
end

or we can delegate invoice sending to a separate worker:

class PaymentWorker
  include Sidekiq::Worker

  def perform(user_id)
    user = User.find(user_id)
    Payment::Gateway.charge!(user)
    EmailWorker.perform_async(user_id, :invoice)
  end
end

In this case, if user.send_invoice_confirmation! fails, we just need to rerun EmailWorker task that will try to send this email again.

Summary

If you want to build systems that are easy maintain and work with - reentrancy is a must be;
Building reentrant workers will save you a lot of time during crashes;
It's not always about transactions - it's more about the state before worker is executed, as long as we can provide same starting point, we can be reentrant;
Reentrancy can be obtained by splitting workers into "atomic" operations that can be rerun;
It's much easier to introduce reentrancy if you use one of finite-state machines available for Ruby;

Tag: reentrancy

Adding reentrancy and a on failure fallback for your Sidekiq workers