After migrating somewhere like 600 news messages from my old PHP CMS to a Susanoo (at Naruto Shippuuden Senpuu website), I’ve realized that a lot of them have an external links to other websites. Most of them should have rel=”nofollow”,but hey, who cared about SEO 8 years ago :)

Fortunately there is a quite simple and convenient way to fix this. Also after implementing this, we will be able to prevent such things in future.

Nokogiri, HTML and CSS selectors

What we need to do? Well, we must:

  • parse our news content (html part generated by CKEditor),
  • fetch all the link tags,
  • add to all the external links a rel=”nofollow” attribute.

To start doing this, let’s add a gem called Nokogiri to our gemfile:

gem 'nokogiri'

Nokogiri basic usage is quite simple:

noko = Nokogiri::HTML.parse(html_stuff)

After we create an Nokogiri::HTML instance we can use a CSS selector to get all the links:

doc.css('any selector').each do |link|
  # do smthng with those links

Each link instance is a Nokogiri::XML::Element so we can easily add an rel attribute:

# Nokogiri::XML::Element link instance
link[:rel] = 'nofollow'

So we could iterate through all the links, add a rel attribute, run to_html method and save the output as our content. Unfortunately we cannot do so because Nokogiri adds some extra stuff, like doctype and header to our html, so when displaying on our website would break the layout.

We could try to gsub links like this:

# Nokogiri::XML::Element link instance
# convert it into a html element
old = link.to_s
# add nofollow and convert to html
link[:rel] = 'nofollow'
new = link.to_s

# try to replace old whole tag with a new one
content.gsub!(old, new)

It might work, but it requires a well formatted and valid xhtml. Won’t work with something like that:

<A href="LINK">msg</A>

Some posts on Naruto Shippuuden are 8 years old and I had to handle also the tags like the one above. So, what can I do?

There is a third, lil bit less elegant approach (yet it works pretty well). We can replace all the href=”link” parts with href=”link” rel=”nofollow”. This approach seams to work for any valid/invalid type of links.

Before filter to the rescue

We can use a before_filter to handle “nofollowing” all the links in our content. To do so, just place the:

before_save :add_nofollow

in your model declaration file and implement the add_nofollow method.

The add_nofollow method is pretty straightforward. We just cover two existing types of bracket: ‘ and “, then we skip links that are local (within our website) and we are done.

def add_nofollow
  doc = Nokogiri::HTML.parse(self.content)
  links = []
  doc.css(selector).each do |link|
    next unless link['rel'].blank?
    next if (link['href'][0,4] != 'http' && link['href'][0,3] != 'www')
    next if (link['href'].downcase.include?(''))
    links << link

  links.uniq.each do |link|
    link['rel'] = 'nofollow'

    href1 = "href='#{link['href']}'"
    href2 = 'href="'+link['href']+'"'

    self.content = self.content.gsub(href1, href1+' rel="nofollow"')
    self.content = self.content.gsub(href2, href2+' rel="nofollow"')

After implementing this in your model logic, you don’t need to worry again about any external links that are inserted into the models.

Problems installing Nokogiri?

If you’ve encountered a problems during the Nokogiri installation (running bundle install):

Building native extensions.  This could take a while…
ERROR:  Error installing nokogiri:
ERROR: Failed to build gem native extension.
/opt/ruby-enterprise-1.8.7-2010.02/bin/ruby extconf.rb
checking for iconv.h… yes
checking for libxml/parser.h… yes
checking for libxslt/xslt.h… no
libxslt is missing.
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of
necessary libraries and/or headers.  Check the mkmf.log file for more
details.  You may need configuration options.
Provided configuration options:

Type into your console:

sudo apt-get install libxslt-dev libxml2-dev

and run again:

bundle install