From time to time, you may want to remove a particular middleware from your Sinatra application. Normally I would not recommend this, but when you test stuff and play around there might be no other way. It is also useful when a middleware contains bugs that make it unusable in your particular case. Removing it is really easy. Only thing you need to know is the name of the middleware you want to remove:
Sidekiq::Web.instance_variable_get(:@middleware).delete_if do |middleware|
middleware.first == Rack::Protection # Or any other middleware name
end
Few years ago I wrote a post about adding rel=”nofollow” to all the links in your comments, news, posts, messages in Ruby on Rails. I've been using this solution for a long time, but recently it started to be a pain in the ass. More and more models, more and more content - having to always declare some sort of filtering logic in the models don't seem legit any more. Instead I've decided to use a different approach. Why not use a Rack middleware that would add the nofollow rel to all "outgoing" links? That way models would not be "polluted" with stuff that is directly related only to views.
Nokogiri is the answer
To replace all the rel attributes, we can use Nokogiri. It is both convenient and fast:
require 'nokogiri'
doc = Nokogiri::HTML.parse(content)
doc.css('a').each do |a|
a.set_attribute('rel', 'noindex nofollow')
end
doc.to_s
Small corner cases that we need to cover
Unfortunately there are some cases that we need to cover, so simple replacing all the links is not an option. We should not add nofollow when:
There's already a rel defined on an anchor
There are local links that should be "followable"
There are local links with a full domain in them
We want to narrow anchor seeking to a given css selector (we want to leave links that are in layout, etc)
If we include all of above, our code should look like this:
require 'nokogiri'
doc = Nokogiri::HTML.parse(content)
scope = '#main-content'
host = 'mensfeld.pl'
doc.css(scope + ' a').each do |a|
# If there's a rel already don't change it
next unless a.get_attribute('rel').blank?
# If this is a local link don't change it
next unless a.get_attribute('href') =~ /\Awww|http/i
# Don't change it also if it is a local link with host
next if a.get_attribute('href') =~ /#{host}/
a.set_attribute('rel', 'noindex nofollow')
end
Hooking it up to Rack middleware
There's a great Rails on Rack tutorial, so I will skip some details.
Our middleware needs to accept following options:
whitelisted host
css scope (if we decide to norrow anchor seeking)
So, the initialize method for our middleware should look like this:
# @param app [SenpuuV7::Application]
# @param host [String] host that should be allowed - we should allow our internal
# links to be without nofollow
# @param scope [String] we can norrow to a given part of HTML (id, class, etc)
def initialize(app, host, scope = 'body')
@app = app
@host = host
@scope = scope
end
Each middleware needs to have a call method:
# @param [Hash] env hash
# @return [Array] full rack response
def call(env)
response = @app.call(env)
proxy = response[2]
# Ignore any non text/html requests
if proxy.is_a?(Rack::BodyProxy) &&
proxy.content_type == 'text/html'
proxy.body = sanitize(proxy.body)
end
response
end
and finally, the sanitize method that encapsulates the Nokogiri logic:
# @param [String] content of a response (body)
# @return [String] sanitized content of response (body)
def sanitize(content)
doc = Nokogiri::HTML.parse(content)
# Stop if we could't parse with HTML
return content unless doc
doc.css(@scope + ' a').each do |a|
# If there's a rel already don't change it
next unless a.get_attribute('rel').blank?
# If this is a local link don't change it
next unless a.get_attribute('href') =~ /\Awww|http/i
# Don't change it also if it is a local link with host
next if a.get_attribute('href') =~ /#{@host}/
a.set_attribute('rel', 'noindex nofollow')
end
doc.to_s
# If anything goes wrong, return original content
rescue
return content
end
Usage example
To use it, just create an initializer in config/initializers of your app with following code:
also don't forget to add gem 'nokogiri' to your gemfile.
Performance
Nokogiri is quite fast and based on benchmark that I did, it takes about 5-30 miliseconds to parse the whole content. Below you can see time and number of links (up to 488) per page. Keep that in mind when you will use this middleware.
TL;DR - Whole middleware
require 'nokogiri'
# Middleware used to ensure that we don't allow any links outside without a
# nofollow rel
# @example
# App.middleware.use NofollowAnchors, 'example.com', 'body'
class NofollowAnchors
# @param app [SenpuuV7::Application]
# @param host [String] host that should be allowed - we should allow our internal
# links to be without nofollow
# @param scope [String] we can norrow to a given part of HTML (id, class, etc)
def initialize(app, host, scope = 'body')
@app = app
@host = host
@scope = scope
end
# @param [Hash] env hash
# @return [Array] full rack response
def call(env)
response = @app.call(env)
proxy = response[2]
if proxy.is_a?(Rack::BodyProxy) &&
proxy.content_type == 'text/html'
proxy.body = sanitize(proxy.body)
end
response
end
private
# @param [String] content of a response (body)
# @return [String] sanitized content of response (body)
def sanitize(content)
doc = Nokogiri::HTML.parse(content)
# Stop if we could't parse with HTML
return content unless doc
doc.css(@scope + ' a').each do |a|
# If there's a rel already don't change it
next unless a.get_attribute('rel').blank?
# If this is a local link don't change it
next unless a.get_attribute('href') =~ /\Awww|http/i
# Don't change it also if it is a local link with host
next if a.get_attribute('href') =~ /#{@host}/
a.set_attribute('rel', 'noindex nofollow')
end
doc.to_s
rescue
return content
end
end