Category: Rails

Making Errbit work faster by keeping it clean and tidy

Errbit is a great tool for collecting and managing errors from ruby applications. It's like Airbrake but can be self-hosted, so you can use it for intranet applications or any apps that should not send data to an external servers.

Errbit is a really good piece of software, unfortunately it can get pretty slow, when you use it extensively. Errbit gets slower and slower mostly because of number of problems that are getting stored in the DB. Even those that are resolved aren't removed by default from it, so after some time, the database can get really huge. It can be an issue especially when you have multiple apps connected with one Errbit instance and they report errors in a huge quantity.

There are two easy steps that you should take to prevent this from happening:

  1.  Remove all resolved issues from the DB (not just hide them)
  2. Auto-resolve issues that are older than 2 weeks and that don't occur anymore

Both this tasks should be executed periodically, so we will use crontab to achieve this.

Removing all resolved issues

There is a rake task for that built in Errbit already. To execute it, just run following command:

bundle exec rake errbit:clear_resolved

If you have Errbit from a long time, runnning this task can take a while. To add it to crontab, just execute crontab -e and paste following command:

0,30 * * * * /bin/bash -l -c 'cd /errbit_location && RAILS_ENV=production nice -n 19  bundle exec rake errbit:clear_resolved'

If you're interested why we use nice to exec this task, you can read about it here: Ruby & Rails: Making sure rake task won’t slow the site dow.

This cron task will be executed every 30 minutes and will automatically remove any resolved issues.

Auto-resolving issues that are older than 2 weeks and that don't occur anymore

It happens quite often, that one fix resolves more than one issue. Sometimes you might not even realise, that your fix, fixed multiple issues. How to handle such a case? Well lets just resolve any issues that didn't occur for at least 2 weeks. Unfrotunately there's no predefine Errbit rake task for this, so we need to write our own. To do this open lib/tasks/errbit/database.rake file and add following task:

desc 'Resolves problems that didnt occur for 2 weeks'
task :cleanup => :environment do
  offset = 2.weeks.ago
  Problem.where(:updated_at.lt => offset).map(&:resolve!)
  Notice.where(:updated_at.lt => offset).destroy_all
end

That way we will get rid of old, not resolved problems. This task should be also executed using crontab:

15,45 * * * * /bin/bash -l -c 'cd /errbit_location && RAILS_ENV=production nice -n 19  bundle exec rake errbit:cleanup'

Notice that we run it 15 minutes before the previous rake task, so resolved issues will be removed by the errbit:clear_resolved task.

Removing errors when heavy crashes occur

Warning! Use this code wisely. This will remove unresolved errors as well!

If you have systems that sometimes tend to throw (due to one issue) 100k+ errors that aren't getting squashed, you might think of using code presented below. It will automatically remove all errors if there are more than 2k of them. That way you won't end up having DB with too many errors.

desc 'Removes issues if we have more than given threshold'
task :optimize => :environment do
  treshold = 2000
  batch_size = 100

  App.all.each do |app|
    next unless app.problems.count > treshold

    batch = []
    app.problems.offset(treshold).each do |problem|
      batch << problem

      if batch.length == batch_size
        next if app.reload.problems.count < treshold

        Err.where(:problem_id.in => batch.map(&:id)).delete_all
        batch.each(&:delete)
        batch = []
      end
    end
  end
end

Of course you need to add it to crontab as well:

25,55 * * * * /bin/bash -l -c 'cd /errbit_location && RAILS_ENV=production nice -n 19  bundle exec rake errbit:optimize'

Rack/Rails middleware that will add rel=”nofollow” to all your links

Few years ago I wrote a post about adding rel=”nofollow” to all the links in your comments, news, posts, messages in Ruby on Rails. I've been using this solution for a long time, but recently it started to be a pain in the ass. More and more models, more and more content - having to always declare some sort of filtering logic in the models don't seem legit any more. Instead I've decided to use a different approach. Why not use a Rack middleware that would add the nofollow rel to all "outgoing" links? That way models would not be "polluted" with stuff that is directly related only to views.

Nokogiri is the answer

To replace all the rel attributes, we can use Nokogiri. It is both convenient and fast:

require 'nokogiri'

doc = Nokogiri::HTML.parse(content)

doc.css('a').each do |a|
  a.set_attribute('rel', 'noindex nofollow')
end

doc.to_s

Small corner cases that we need to cover

Unfortunately there are some cases that we need to cover, so simple replacing all the links is not an option. We should not add nofollow when:

  • There's already a rel defined on an anchor
  • There are local links that should be "followable"
  • There are local links with a full domain in them
  • We want to narrow anchor seeking to a given css selector (we want to leave links that are in layout, etc)

If we include all of above, our code should look like this:

require 'nokogiri'

doc = Nokogiri::HTML.parse(content)
scope = '#main-content'
host = 'mensfeld.pl'

doc.css(scope + ' a').each do |a|
  # If there's a rel already don't change it
  next unless a.get_attribute('rel').blank?
  # If this is a local link don't change it
  next unless a.get_attribute('href') =~ /\Awww|http/i
  # Don't change it also if it is a local link with host
  next if a.get_attribute('href') =~ /#{host}/

  a.set_attribute('rel', 'noindex nofollow')
end

Hooking it up to Rack middleware

There's a great Rails on Rack tutorial, so I will skip some details.

Our middleware needs to accept following options:

  • whitelisted host
  • css scope (if we decide to norrow anchor seeking)

So, the initialize method for our middleware should look like this:

# @param app [SenpuuV7::Application]
# @param host [String] host that should be allowed - we should allow our internal
#   links to be without nofollow
# @param scope [String] we can norrow to a given part of HTML (id, class, etc)
def initialize(app, host, scope = 'body')
  @app = app
  @host = host
  @scope = scope
end

Each middleware needs to have a call method:

# @param [Hash] env hash
# @return [Array] full rack response
def call(env)
  response = @app.call(env)
  proxy = response[2]

  # Ignore any non text/html requests
  if proxy.is_a?(Rack::BodyProxy) &&
    proxy.content_type == 'text/html'
    proxy.body = sanitize(proxy.body)
  end

  response
end

and finally, the sanitize method that encapsulates the Nokogiri logic:

# @param [String] content of a response (body)
# @return [String] sanitized content of response (body)
def sanitize(content)
  doc = Nokogiri::HTML.parse(content)
  # Stop if we could't parse with HTML
  return content unless doc

  doc.css(@scope + ' a').each do |a|
    # If there's a rel already don't change it
    next unless a.get_attribute('rel').blank?
    # If this is a local link don't change it
    next unless a.get_attribute('href') =~ /\Awww|http/i
    # Don't change it also if it is a local link with host
    next if a.get_attribute('href') =~ /#{@host}/

    a.set_attribute('rel', 'noindex nofollow')
  end

  doc.to_s
# If anything goes wrong, return original content
rescue
  return content
end

Usage example

To use it, just create an initializer in config/initializers of your app with following code:

require 'nofollow_anchors'

MyApp::Application.config.middleware.use NofollowAnchors, 'mensfeld.pl', 'body #main-content'

also don't forget to add gem 'nokogiri' to your gemfile.

Performance

Nokogiri is quite fast and based on benchmark that I did, it takes about 5-30 miliseconds to parse the whole content. Below you can see time and number of links (up to 488) per page. Keep that in mind when you will use this middleware.

perf

TL;DR - Whole middleware

require 'nokogiri'

# Middleware used to ensure that we don't allow any links outside without a
# nofollow rel
# @example
#   App.middleware.use NofollowAnchors, 'example.com', 'body'
class NofollowAnchors
  # @param app [SenpuuV7::Application]
  # @param host [String] host that should be allowed - we should allow our internal
  #   links to be without nofollow
  # @param scope [String] we can norrow to a given part of HTML (id, class, etc)
  def initialize(app, host, scope = 'body')
    @app = app
    @host = host
    @scope = scope
  end

  # @param [Hash] env hash
  # @return [Array] full rack response
  def call(env)
    response = @app.call(env)
    proxy = response[2]

    if proxy.is_a?(Rack::BodyProxy) &&
      proxy.content_type == 'text/html'
      proxy.body = sanitize(proxy.body)
    end

    response
  end

  private

  # @param [String] content of a response (body)
  # @return [String] sanitized content of response (body)
  def sanitize(content)
    doc = Nokogiri::HTML.parse(content)
    # Stop if we could't parse with HTML
    return content unless doc

    doc.css(@scope + ' a').each do |a|
      # If there's a rel already don't change it
      next unless a.get_attribute('rel').blank?
      # If this is a local link don't change it
      next unless a.get_attribute('href') =~ /\Awww|http/i
      # Don't change it also if it is a local link with host
      next if a.get_attribute('href') =~ /#{@host}/

      a.set_attribute('rel', 'noindex nofollow')
    end

    doc.to_s
  rescue
    return content
  end
end

Copyright © 2025 Closer to Code

Theme by Anders NorenUp ↑