Adding rel=”nofollow” to all the links in your comments, news, posts, messages in Ruby on Rails

SEO, SEO, SEO

After migrating somewhere like 600 news messages from my old PHP CMS to a Susanoo (at Naruto Shippuuden Senpuu website), I've realized that a lot of them have an external links to other websites. Most of them should have rel="nofollow",but hey, who cared about SEO 8 years ago :)

Fortunately there is a quite simple and convenient way to fix this. Also after implementing this, we will be able to prevent such things in future.

Nokogiri, HTML and CSS selectors

What we need to do? Well, we must:

  • parse our news content (html part generated by CKEditor),
  • fetch all the link tags,
  • add to all the external links a rel="nofollow" attribute.

To start doing this, let's add a gem called Nokogiri to our gemfile:

gem 'nokogiri'

Nokogiri basic usage is quite simple:

noko = Nokogiri::HTML.parse(html_stuff)

After we create an Nokogiri::HTML instance we can use a CSS selector to get all the links:

doc.css('any selector').each do |link|
  # do smthng with those links
end

Each link instance is a Nokogiri::XML::Element so we can easily add an rel attribute:

# Nokogiri::XML::Element link instance
link[:rel] = 'nofollow'

So we could iterate through all the links, add a rel attribute, run to_html method and save the output as our content. Unfortunately we cannot do so because Nokogiri adds some extra stuff, like doctype and header to our html, so when displaying on our website would break the layout.

We could try to gsub links like this:

# Nokogiri::XML::Element link instance
# convert it into a html element
old = link.to_s
# add nofollow and convert to html
link[:rel] = 'nofollow'
new = link.to_s

# try to replace old whole tag with a new one
content.gsub!(old, new)

It might work, but it requires a well formatted and valid xhtml. Won't work with something like that:

<A href="LINK">msg</A>

Some posts on Naruto Shippuuden Senpuu.net are 8 years old and I had to handle also the tags like the one above. So, what can I do?

There is a third, lil bit less elegant approach (yet it works pretty well). We can replace all the href="link" parts with href="link" rel="nofollow". This approach seams to work for any valid/invalid type of links.

Before filter to the rescue

We can use a before_filter to handle "nofollowing" all the links in our content. To do so, just place the:

before_save :add_nofollow

in your model declaration file and implement the add_nofollow method.

The add_nofollow method is pretty straightforward. We just cover two existing types of bracket: ' and ", then we skip links that are local (within our website) and we are done.

def add_nofollow
  doc = Nokogiri::HTML.parse(self.content)
  links = []
  doc.css(selector).each do |link|
    next unless link['rel'].blank?
    next if (link['href'][0,4] != 'http' && link['href'][0,3] != 'www')
    next if (link['href'].downcase.include?('senpuu.net'))
    links << link
  end

  links.uniq.each do |link|
    link['rel'] = 'nofollow'

    href1 = "href='#{link['href']}'"
    href2 = 'href="'+link['href']+'"'

    self.content = self.content.gsub(href1, href1+' rel="nofollow"')
    self.content = self.content.gsub(href2, href2+' rel="nofollow"')
  end
end

After implementing this in your model logic, you don't need to worry again about any external links that are inserted into the models.

Problems installing Nokogiri?

If you've encountered a problems during the Nokogiri installation (running bundle install):

Building native extensions.  This could take a while…
ERROR:  Error installing nokogiri:
ERROR: Failed to build gem native extension.
/opt/ruby-enterprise-1.8.7-2010.02/bin/ruby extconf.rb
checking for iconv.h… yes
checking for libxml/parser.h… yes
checking for libxslt/xslt.h… no
—–
libxslt is missing.
—–
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of
necessary libraries and/or headers.  Check the mkmf.log file for more
details.  You may need configuration options.
Provided configuration options:
–with-opt-dir
–without-opt-dir
–with-opt-include
–without-opt-include=${opt-dir}/include
–with-opt-lib
–without-opt-lib=${opt-dir}/lib
–with-make-prog
–without-make-prog
–srcdir=.
–curdir
–ruby=/opt/ruby-enterprise-1.8.7-2010.02/bin/ruby
–with-zlib-dir
–without-zlib-dir
–with-zlib-include

Type into your console:

sudo apt-get install libxslt-dev libxml2-dev

and run again:

bundle install

Categories: Ruby, Software

5 Comments

  1. Way cool! Some very valid points! I appreciate you writing this post
    and also the rest of the website is also very good.

  2. Michail Kabanov

    October 19, 2012 — 07:39

    def add_nofollow
    doc = Nokogiri::HTML.parse(self.content)
    doc.css(‘a’).each do |a|
    a.set_attribute(‘rel’, ‘noindex nofollow’)
    end
    self.content=doc.to_s
    end

  3. Good one! – thx! There is only one minus – you can’t skip adding nofollow for given urls (this will change all the links)

  4. Nicklas Ramhöj

    February 11, 2013 — 13:46

    Thank for the post!

    Instead of using gsub to remove the doctype etc you could use Nokogiri::HTML::DocumentFragment#parse (http://nokogiri.org/Nokogiri/XML/DocumentFragment.html) to parse the html. Calling #to_html or #to_s on this instance doesn’t add the doctype.

  5. I am sorry if my question is dumb. What do you mean by ‘content’ or html_stuff? Let’s say i got controller Rumors. Should I replace this html_stuff with rumors or what? Also, i have got views called rumors (generates by controller generator).

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Copyright © 2024 Closer to Code

Theme by Anders NorenUp ↑