Errbit is a great tool for collecting and managing errors from ruby applications. It's like Airbrake but can be self-hosted, so you can use it for intranet applications or any apps that should not send data to an external servers.
Errbit is a really good piece of software, unfortunately it can get pretty slow, when you use it extensively. Errbit gets slower and slower mostly because of number of problems that are getting stored in the DB. Even those that are resolved aren't removed by default from it, so after some time, the database can get really huge. It can be an issue especially when you have multiple apps connected with one Errbit instance and they report errors in a huge quantity.
There are two easy steps that you should take to prevent this from happening:
- Remove all resolved issues from the DB (not just hide them)
- Auto-resolve issues that are older than 2 weeks and that don't occur anymore
Both this tasks should be executed periodically, so we will use crontab to achieve this.
Removing all resolved issues
There is a rake task for that built in Errbit already. To execute it, just run following command:
bundle exec rake errbit:clear_resolved
If you have Errbit from a long time, runnning this task can take a while. To add it to crontab, just execute crontab -e and paste following command:
0,30 * * * * /bin/bash -l -c 'cd /errbit_location && RAILS_ENV=production nice -n 19 bundle exec rake errbit:clear_resolved'
If you're interested why we use nice to exec this task, you can read about it here: Ruby & Rails: Making sure rake task won’t slow the site dow.
This cron task will be executed every 30 minutes and will automatically remove any resolved issues.
Auto-resolving issues that are older than 2 weeks and that don't occur anymore
It happens quite often, that one fix resolves more than one issue. Sometimes you might not even realise, that your fix, fixed multiple issues. How to handle such a case? Well lets just resolve any issues that didn't occur for at least 2 weeks. Unfrotunately there's no predefine Errbit rake task for this, so we need to write our own. To do this open lib/tasks/errbit/database.rake file and add following task:
desc 'Resolves problems that didnt occur for 2 weeks' task :cleanup => :environment do offset = 2.weeks.ago Problem.where(:updated_at.lt => offset).map(&:resolve!) Notice.where(:updated_at.lt => offset).destroy_all end
That way we will get rid of old, not resolved problems. This task should be also executed using crontab:
15,45 * * * * /bin/bash -l -c 'cd /errbit_location && RAILS_ENV=production nice -n 19 bundle exec rake errbit:cleanup'
Notice that we run it 15 minutes before the previous rake task, so resolved issues will be removed by the errbit:clear_resolved task.
Removing errors when heavy crashes occur
Warning! Use this code wisely. This will remove unresolved errors as well!
If you have systems that sometimes tend to throw (due to one issue) 100k+ errors that aren't getting squashed, you might think of using code presented below. It will automatically remove all errors if there are more than 2k of them. That way you won't end up having DB with too many errors.
desc 'Removes issues if we have more than given threshold' task :optimize => :environment do treshold = 2000 batch_size = 100 App.all.each do |app| next unless app.problems.count > treshold batch = [] app.problems.offset(treshold).each do |problem| batch << problem if batch.length == batch_size next if app.reload.problems.count < treshold Err.where(:problem_id.in => batch.map(&:id)).delete_all batch.each(&:delete) batch = [] end end end end
Of course you need to add it to crontab as well:
25,55 * * * * /bin/bash -l -c 'cd /errbit_location && RAILS_ENV=production nice -n 19 bundle exec rake errbit:optimize'