Category: Rails

Rack/Rails middleware that will add rel=”nofollow” to all your links

Few years ago I wrote a post about adding rel=”nofollow” to all the links in your comments, news, posts, messages in Ruby on Rails. I've been using this solution for a long time, but recently it started to be a pain in the ass. More and more models, more and more content - having to always declare some sort of filtering logic in the models don't seem legit any more. Instead I've decided to use a different approach. Why not use a Rack middleware that would add the nofollow rel to all "outgoing" links? That way models would not be "polluted" with stuff that is directly related only to views.

Nokogiri is the answer

To replace all the rel attributes, we can use Nokogiri. It is both convenient and fast:

require 'nokogiri'

doc = Nokogiri::HTML.parse(content)

doc.css('a').each do |a|
  a.set_attribute('rel', 'noindex nofollow')
end

doc.to_s

Small corner cases that we need to cover

Unfortunately there are some cases that we need to cover, so simple replacing all the links is not an option. We should not add nofollow when:

  • There's already a rel defined on an anchor
  • There are local links that should be "followable"
  • There are local links with a full domain in them
  • We want to narrow anchor seeking to a given css selector (we want to leave links that are in layout, etc)

If we include all of above, our code should look like this:

require 'nokogiri'

doc = Nokogiri::HTML.parse(content)
scope = '#main-content'
host = 'mensfeld.pl'

doc.css(scope + ' a').each do |a|
  # If there's a rel already don't change it
  next unless a.get_attribute('rel').blank?
  # If this is a local link don't change it
  next unless a.get_attribute('href') =~ /\Awww|http/i
  # Don't change it also if it is a local link with host
  next if a.get_attribute('href') =~ /#{host}/

  a.set_attribute('rel', 'noindex nofollow')
end

Hooking it up to Rack middleware

There's a great Rails on Rack tutorial, so I will skip some details.

Our middleware needs to accept following options:

  • whitelisted host
  • css scope (if we decide to norrow anchor seeking)

So, the initialize method for our middleware should look like this:

# @param app [SenpuuV7::Application]
# @param host [String] host that should be allowed - we should allow our internal
#   links to be without nofollow
# @param scope [String] we can norrow to a given part of HTML (id, class, etc)
def initialize(app, host, scope = 'body')
  @app = app
  @host = host
  @scope = scope
end

Each middleware needs to have a call method:

# @param [Hash] env hash
# @return [Array] full rack response
def call(env)
  response = @app.call(env)
  proxy = response[2]

  # Ignore any non text/html requests
  if proxy.is_a?(Rack::BodyProxy) &&
    proxy.content_type == 'text/html'
    proxy.body = sanitize(proxy.body)
  end

  response
end

and finally, the sanitize method that encapsulates the Nokogiri logic:

# @param [String] content of a response (body)
# @return [String] sanitized content of response (body)
def sanitize(content)
  doc = Nokogiri::HTML.parse(content)
  # Stop if we could't parse with HTML
  return content unless doc

  doc.css(@scope + ' a').each do |a|
    # If there's a rel already don't change it
    next unless a.get_attribute('rel').blank?
    # If this is a local link don't change it
    next unless a.get_attribute('href') =~ /\Awww|http/i
    # Don't change it also if it is a local link with host
    next if a.get_attribute('href') =~ /#{@host}/

    a.set_attribute('rel', 'noindex nofollow')
  end

  doc.to_s
# If anything goes wrong, return original content
rescue
  return content
end

Usage example

To use it, just create an initializer in config/initializers of your app with following code:

require 'nofollow_anchors'

MyApp::Application.config.middleware.use NofollowAnchors, 'mensfeld.pl', 'body #main-content'

also don't forget to add gem 'nokogiri' to your gemfile.

Performance

Nokogiri is quite fast and based on benchmark that I did, it takes about 5-30 miliseconds to parse the whole content. Below you can see time and number of links (up to 488) per page. Keep that in mind when you will use this middleware.

perf

TL;DR - Whole middleware

require 'nokogiri'

# Middleware used to ensure that we don't allow any links outside without a
# nofollow rel
# @example
#   App.middleware.use NofollowAnchors, 'example.com', 'body'
class NofollowAnchors
  # @param app [SenpuuV7::Application]
  # @param host [String] host that should be allowed - we should allow our internal
  #   links to be without nofollow
  # @param scope [String] we can norrow to a given part of HTML (id, class, etc)
  def initialize(app, host, scope = 'body')
    @app = app
    @host = host
    @scope = scope
  end

  # @param [Hash] env hash
  # @return [Array] full rack response
  def call(env)
    response = @app.call(env)
    proxy = response[2]

    if proxy.is_a?(Rack::BodyProxy) &&
      proxy.content_type == 'text/html'
      proxy.body = sanitize(proxy.body)
    end

    response
  end

  private

  # @param [String] content of a response (body)
  # @return [String] sanitized content of response (body)
  def sanitize(content)
    doc = Nokogiri::HTML.parse(content)
    # Stop if we could't parse with HTML
    return content unless doc

    doc.css(@scope + ' a').each do |a|
      # If there's a rel already don't change it
      next unless a.get_attribute('rel').blank?
      # If this is a local link don't change it
      next unless a.get_attribute('href') =~ /\Awww|http/i
      # Don't change it also if it is a local link with host
      next if a.get_attribute('href') =~ /#{@host}/

      a.set_attribute('rel', 'noindex nofollow')
    end

    doc.to_s
  rescue
    return content
  end
end

Adding reentrancy and a on failure fallback for your Sidekiq workers

Few months ago I've created a post about reentrancy: Ruby (Rails, Sinatra) background processing – Reentrancy for your workers is a must be!.

In this post, I will present a nice way to implement such feature for your Sidekiq workers.

Simple reentrancy

Normally a Sidekiq worker looks similar to this one:

class ExampleWorker
  include Sidekiq::Worker

  def perform(*args)
    # Background logic here
  end
end

and if something goes wrong, you should see it in your Sidekiq log (or a bugtracker like Errbit). However there is no reentrancy there. You could catch exceptions and handle reentrancy with this code:

class ExampleWorker
  include Sidekiq::Worker

  def perform(*args)
    # Background logic here
  rescue => exception
    # Do something on failure before reraising
    raise exception
  end
end

but it is not too elegant and if you have multiple Sidekiq workers, than probably you will end-up with a lot of code duplication.

Making your reentrancy code more fancy

Instead of handling reentrancy in every single worker, you could just create a base worker class, that would provide such functionality for all the workers that would inherit from the base one:

class BaseWorker
  include Sidekiq::Worker

  def perform(*args)
    # you need to implement the execute method
    # execute method should contain code you want to execute in the background
    execute(*args)
  rescue => exception
    after_failure(*args) if respond_to?(:after_failure)
    raise exception
  end
end

Now, instead of implementing a perform method in every worker, you need to name it (or rename) execute. Perform method will act as a wrapper that will try to execute your worker code and if it fails, will run the after_failure method (if it exists).

Note that the error will be reraised, but now we have a fallback to do for example some database status changes.

class KeywordsWorker < BaseWorker
  def execute(keyword_name)
    KeywordsService.new.remotely(keyword_name)
  end

  # Bring to an expire state if something goes wrong
  def after_failure(keyword_name)
    KeywordsService.new.expire(keyword_name)
  end
end

Of course we might have workers, that won't require reentrancy at all. Then we just skip the after_failure method and thanks to the respond_to? method, everything will work normally.

Copyright © 2025 Closer to Code

Theme by Anders NorenUp ↑