Background processing is a must-be in any bigger project. There’s no way to process everything in a lifetime of a single request. It’s such common, that I venture to say that each and every one of us made at least one background worker. Today I would like to tell you a bit about reentrancy.
What is reentrancy
I will just quote Wikipedia, since their description is really nice:
In computing, a computer program or subroutine is called reentrant if it can be interrupted in the middle of its execution and then safely called again (“re-entered”) before its previous invocations complete execution. The interruption could be caused by an internal action such as a jump or call, or by an external action such as a hardware interrupt or signal. Once the re-entered invocation completes, the previous invocations will resume correct execution.
So basically it means, that if our worker crashes, we can execute him again and everything will be just fine. No database states fixing, no cleanups, no resetting – nothing. Just re-executing worker task.
How many workers do you have like this? ;) I must admit: I’ve created non-reentrant workers many times and many times I wish I didn’t.
Why our workers should be reentrant and why it’s not an overhead to make them like this
Making workers reentrant, especially at the beginning will take you more time than creating a “standard” one. “This will create an overhead” you might think. This might be true but… it’s not. If you have many workers that are constantly doing something, and some of them crash, reentrancy will save you a lot of time. It allows you to just fix the issue and rerun tasks again, without having to worry about anything else. Without it, you would spend some time fixing database structure, cleaning things up, performing requests to external APIs from production console and other non-programming related stuff. I guarantee that you will waste much more time doing this, than writing your workers well in a first place.
How to make my workers reentrant?
It’s not so hard as you might thing, although sometimes it can be a bit tricky. Of course every worker is somehow unique but there are some general rules that you can use.
Transactional, non-API example test case
The easiest stuff is with non-API, transaction-only workers, that calculate and update some data:
class ScoreWorker include Sidekiq::Worker def perform(user_id) user = User.find(user_id) user.update(status: 'calculating') # This can take a while... user.calculate_score! user.update(status: 'calculated') end end
If something would happen when calculate_score! is executed, we would end up with a user with endless “calculating” state. The easiest way to fix this, is to use ActiveRecord::Base.transaction block:
class ScoreWorker include Sidekiq::Worker def perform(user_id) ActiveRecord::Base.transaction do user = User.find(user_id) user.update(status: 'calculating') # This can take a while... user.calculate_score! user.update(status: 'calculated') end end end
If anything goes wrong, we will get back to where we started. Unfortunately this approach has one huge disadvantage: user status is changed in transaction, so until it is committed, we won’t have the ‘calculating’ status (at least if you don’t have dirty reads).
Non-transactional, non-API example test case
Approach presented below can be also used to improve previous example. Let’s imagine we don’t have transactional DB and that every operation is performed separately. We need to catch an exception, rewind everything back and then just reraise error:
class ScoreWorker include Sidekiq::Worker def perform(user_id) user = User.find(user_id) # State machine is always nice :) user.calculating_score! # This can take a while... user.calculate_score! user.calculated_score! rescue e # Reset everything so it can be processed again later # We "if" in case error was raised in the first line user.reset_score! if user raise e end end
Of course this will not prevent us from DB failures, but in my experience, workers tend to fail mostly not because of database issues but because of some problems in the app (worker) logic.
Non-transactional, API example test case
What about external API interactions? When we change something remote, we cannot just simply “unchange” stuff. Let’s say that we have a charging mechanism, that makes a call (charge) and then it sends an invoice to this user:
class PaymentWorker include Sidekiq::Worker def perform(user_id) user = User.find(user_id) Payment::Gateway.charge!(user) user.send_invoice_confirmation! end end
How can we provide reentrancy for worker like that? What will happen when user.send_invoice_confirmation! fails? We cannot charge user again for the same month. This means, that we cannot execute this worker task again. We might check whether or not user has been charged:
class PaymentWorker include Sidekiq::Worker def perform(user_id) user = User.find(user_id) Payment::Gateway.charge!(user) unless user.charged? user.send_invoice_confirmation! end end
or we can delegate invoice sending to a separate worker:
class PaymentWorker include Sidekiq::Worker def perform(user_id) user = User.find(user_id) Payment::Gateway.charge!(user) EmailWorker.perform_async(user_id, :invoice) end end
In this case, if user.send_invoice_confirmation! fails, we just need to rerun EmailWorker task that will try to send this email again.
- If you want to build systems that are easy maintain and work with – reentrancy is a must be;
- Building reentrant workers will save you a lot of time during crashes;
- It’s not always about transactions – it’s more about the state before worker is executed, as long as we can provide same starting point, we can be reentrant;
- Reentrancy can be obtained by splitting workers into “atomic” operations that can be rerun;
- It’s much easier to introduce reentrancy if you use one of finite-state machines available for Ruby;