Tag: Rails

Making Errbit work faster by keeping it clean and tidy

Errbit is a great tool for collecting and managing errors from ruby applications. It's like Airbrake but can be self-hosted, so you can use it for intranet applications or any apps that should not send data to an external servers.

Errbit is a really good piece of software, unfortunately it can get pretty slow, when you use it extensively. Errbit gets slower and slower mostly because of number of problems that are getting stored in the DB. Even those that are resolved aren't removed by default from it, so after some time, the database can get really huge. It can be an issue especially when you have multiple apps connected with one Errbit instance and they report errors in a huge quantity.

There are two easy steps that you should take to prevent this from happening:

  1.  Remove all resolved issues from the DB (not just hide them)
  2. Auto-resolve issues that are older than 2 weeks and that don't occur anymore

Both this tasks should be executed periodically, so we will use crontab to achieve this.

Removing all resolved issues

There is a rake task for that built in Errbit already. To execute it, just run following command:

bundle exec rake errbit:clear_resolved

If you have Errbit from a long time, runnning this task can take a while. To add it to crontab, just execute crontab -e and paste following command:

0,30 * * * * /bin/bash -l -c 'cd /errbit_location && RAILS_ENV=production nice -n 19  bundle exec rake errbit:clear_resolved'

If you're interested why we use nice to exec this task, you can read about it here: Ruby & Rails: Making sure rake task won’t slow the site dow.

This cron task will be executed every 30 minutes and will automatically remove any resolved issues.

Auto-resolving issues that are older than 2 weeks and that don't occur anymore

It happens quite often, that one fix resolves more than one issue. Sometimes you might not even realise, that your fix, fixed multiple issues. How to handle such a case? Well lets just resolve any issues that didn't occur for at least 2 weeks. Unfrotunately there's no predefine Errbit rake task for this, so we need to write our own. To do this open lib/tasks/errbit/database.rake file and add following task:

desc 'Resolves problems that didnt occur for 2 weeks'
task :cleanup => :environment do
  offset = 2.weeks.ago
  Problem.where(:updated_at.lt => offset).map(&:resolve!)
  Notice.where(:updated_at.lt => offset).destroy_all
end

That way we will get rid of old, not resolved problems. This task should be also executed using crontab:

15,45 * * * * /bin/bash -l -c 'cd /errbit_location && RAILS_ENV=production nice -n 19  bundle exec rake errbit:cleanup'

Notice that we run it 15 minutes before the previous rake task, so resolved issues will be removed by the errbit:clear_resolved task.

Removing errors when heavy crashes occur

Warning! Use this code wisely. This will remove unresolved errors as well!

If you have systems that sometimes tend to throw (due to one issue) 100k+ errors that aren't getting squashed, you might think of using code presented below. It will automatically remove all errors if there are more than 2k of them. That way you won't end up having DB with too many errors.

desc 'Removes issues if we have more than given threshold'
task :optimize => :environment do
  treshold = 2000
  batch_size = 100

  App.all.each do |app|
    next unless app.problems.count > treshold

    batch = []
    app.problems.offset(treshold).each do |problem|
      batch << problem

      if batch.length == batch_size
        next if app.reload.problems.count < treshold

        Err.where(:problem_id.in => batch.map(&:id)).delete_all
        batch.each(&:delete)
        batch = []
      end
    end
  end
end

Of course you need to add it to crontab as well:

25,55 * * * * /bin/bash -l -c 'cd /errbit_location && RAILS_ENV=production nice -n 19  bundle exec rake errbit:optimize'

Adding reentrancy and a on failure fallback for your Sidekiq workers

Few months ago I've created a post about reentrancy: Ruby (Rails, Sinatra) background processing – Reentrancy for your workers is a must be!.

In this post, I will present a nice way to implement such feature for your Sidekiq workers.

Simple reentrancy

Normally a Sidekiq worker looks similar to this one:

class ExampleWorker
  include Sidekiq::Worker

  def perform(*args)
    # Background logic here
  end
end

and if something goes wrong, you should see it in your Sidekiq log (or a bugtracker like Errbit). However there is no reentrancy there. You could catch exceptions and handle reentrancy with this code:

class ExampleWorker
  include Sidekiq::Worker

  def perform(*args)
    # Background logic here
  rescue => exception
    # Do something on failure before reraising
    raise exception
  end
end

but it is not too elegant and if you have multiple Sidekiq workers, than probably you will end-up with a lot of code duplication.

Making your reentrancy code more fancy

Instead of handling reentrancy in every single worker, you could just create a base worker class, that would provide such functionality for all the workers that would inherit from the base one:

class BaseWorker
  include Sidekiq::Worker

  def perform(*args)
    # you need to implement the execute method
    # execute method should contain code you want to execute in the background
    execute(*args)
  rescue => exception
    after_failure(*args) if respond_to?(:after_failure)
    raise exception
  end
end

Now, instead of implementing a perform method in every worker, you need to name it (or rename) execute. Perform method will act as a wrapper that will try to execute your worker code and if it fails, will run the after_failure method (if it exists).

Note that the error will be reraised, but now we have a fallback to do for example some database status changes.

class KeywordsWorker < BaseWorker
  def execute(keyword_name)
    KeywordsService.new.remotely(keyword_name)
  end

  # Bring to an expire state if something goes wrong
  def after_failure(keyword_name)
    KeywordsService.new.expire(keyword_name)
  end
end

Of course we might have workers, that won't require reentrancy at all. Then we just skip the after_failure method and thanks to the respond_to? method, everything will work normally.

Copyright © 2025 Closer to Code

Theme by Anders NorenUp ↑