Tag: Ruby 1.9

Adding rel=”nofollow” to all the links in your comments, news, posts, messages in Ruby on Rails

SEO, SEO, SEO

After migrating somewhere like 600 news messages from my old PHP CMS to a Susanoo (at Naruto Shippuuden Senpuu website), I've realized that a lot of them have an external links to other websites. Most of them should have rel="nofollow",but hey, who cared about SEO 8 years ago :)

Fortunately there is a quite simple and convenient way to fix this. Also after implementing this, we will be able to prevent such things in future.

Nokogiri, HTML and CSS selectors

What we need to do? Well, we must:

  • parse our news content (html part generated by CKEditor),
  • fetch all the link tags,
  • add to all the external links a rel="nofollow" attribute.

To start doing this, let's add a gem called Nokogiri to our gemfile:

gem 'nokogiri'

Nokogiri basic usage is quite simple:

noko = Nokogiri::HTML.parse(html_stuff)

After we create an Nokogiri::HTML instance we can use a CSS selector to get all the links:

doc.css('any selector').each do |link|
  # do smthng with those links
end

Each link instance is a Nokogiri::XML::Element so we can easily add an rel attribute:

# Nokogiri::XML::Element link instance
link[:rel] = 'nofollow'

So we could iterate through all the links, add a rel attribute, run to_html method and save the output as our content. Unfortunately we cannot do so because Nokogiri adds some extra stuff, like doctype and header to our html, so when displaying on our website would break the layout.

We could try to gsub links like this:

# Nokogiri::XML::Element link instance
# convert it into a html element
old = link.to_s
# add nofollow and convert to html
link[:rel] = 'nofollow'
new = link.to_s

# try to replace old whole tag with a new one
content.gsub!(old, new)

It might work, but it requires a well formatted and valid xhtml. Won't work with something like that:

<A href="LINK">msg</A>

Some posts on Naruto Shippuuden Senpuu.net are 8 years old and I had to handle also the tags like the one above. So, what can I do?

There is a third, lil bit less elegant approach (yet it works pretty well). We can replace all the href="link" parts with href="link" rel="nofollow". This approach seams to work for any valid/invalid type of links.

Before filter to the rescue

We can use a before_filter to handle "nofollowing" all the links in our content. To do so, just place the:

before_save :add_nofollow

in your model declaration file and implement the add_nofollow method.

The add_nofollow method is pretty straightforward. We just cover two existing types of bracket: ' and ", then we skip links that are local (within our website) and we are done.

def add_nofollow
  doc = Nokogiri::HTML.parse(self.content)
  links = []
  doc.css(selector).each do |link|
    next unless link['rel'].blank?
    next if (link['href'][0,4] != 'http' && link['href'][0,3] != 'www')
    next if (link['href'].downcase.include?('senpuu.net'))
    links << link
  end

  links.uniq.each do |link|
    link['rel'] = 'nofollow'

    href1 = "href='#{link['href']}'"
    href2 = 'href="'+link['href']+'"'

    self.content = self.content.gsub(href1, href1+' rel="nofollow"')
    self.content = self.content.gsub(href2, href2+' rel="nofollow"')
  end
end

After implementing this in your model logic, you don't need to worry again about any external links that are inserted into the models.

Problems installing Nokogiri?

If you've encountered a problems during the Nokogiri installation (running bundle install):

Building native extensions.  This could take a while…
ERROR:  Error installing nokogiri:
ERROR: Failed to build gem native extension.
/opt/ruby-enterprise-1.8.7-2010.02/bin/ruby extconf.rb
checking for iconv.h… yes
checking for libxml/parser.h… yes
checking for libxslt/xslt.h… no
—–
libxslt is missing.
—–
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of
necessary libraries and/or headers.  Check the mkmf.log file for more
details.  You may need configuration options.
Provided configuration options:
–with-opt-dir
–without-opt-dir
–with-opt-include
–without-opt-include=${opt-dir}/include
–with-opt-lib
–without-opt-lib=${opt-dir}/lib
–with-make-prog
–without-make-prog
–srcdir=.
–curdir
–ruby=/opt/ruby-enterprise-1.8.7-2010.02/bin/ruby
–with-zlib-dir
–without-zlib-dir
–with-zlib-include

Type into your console:

sudo apt-get install libxslt-dev libxml2-dev

and run again:

bundle install

Using Ruby and Zip library to compress directories and read single file from compressed collection

I have an application in which I store a lot of data in text files.Recently I've needed to compress this data into datasets and send it to a browser. I've also decided to remove uncompressed data and leave only zipped files. The mayor advantage is HDD consumption - 90% less space needed to store data! However I've encountered a problem. How to retrieve a single file from a zipped collection without unzipping whole collection? Well as always - with Ruby it's quite easy :)

I've created a small wrapper to a Zip Ruby library. It will contain 3 methods:

  1. self.zip - used to compress directory
  2. self.unzip - used to decompress directory
  3. self.open_one - used to retrieve single file content from a compressed directory

First of all, compression...

Zipping directory

require 'rubygems'
require 'zip/zip'
require 'find'
require 'fileutils'

class Zipper

  def self.zip(dir, zip_dir, remove_after = false)
    Zip::ZipFile.open(zip_dir, Zip::ZipFile::CREATE)do |zipfile|
      Find.find(dir) do |path|
        Find.prune if File.basename(path)[0] == ?.
        dest = /#{dir}\/(\w.*)/.match(path)
        # Skip files if they exists
        begin
          zipfile.add(dest[1],path) if dest
        rescue Zip::ZipEntryExistsError
        end
      end
    end
    FileUtils.rm_rf(dir) if remove_after
  end

end

We catch Zip::ZipEntryExistsError exception - so we won't overwrite files in an archive if the file already exist. After all (no exceptions raised) we can remove the source directory:

Zipper.zip('/home/user/directory', '/home/user/compressed.zip')

Unzipping directory

class Zipper

  def self.unzip(zip, unzip_dir, remove_after = false)
    Zip::ZipFile.open(zip) do |zip_file|
      zip_file.each do |f|
        f_path=File.join(unzip_dir, f.name)
        FileUtils.mkdir_p(File.dirname(f_path))
        zip_file.extract(f, f_path) unless File.exist?(f_path)
      end
    end
    FileUtils.rm(zip) if remove_after
  end

end

Usage is similar to the zip method. We provide zip file, directory to unzip and we decide whether or not to remove source file after unzipping its content.

Zipper.unzip('/home/user/compressed.zip','/home/user/directory', true)

Retrieving single file content

class Zipper

  def self.open_one(zip_source, file_name)
    Zip::ZipFile.open(zip_source) do |zip_file|
      zip_file.each do |f|
        next unless "#{f}" == file_name
        return f.get_input_stream.read
      end
    end
    nil
  end

end

Usage:

Zipper.open_one('/home/user/source.zip', 'subdir_in_zip/file.ext')

If file doesn't exist nil will be returned. This method does not save this file - it will return decompressed content (but won't save it). I use it to serve this content via web-server. What about performance? Well it depends on zipped file size, amount of compressed files in archive and our "target" file size. Below a simple chart showing relationship between the number of files and the speed of accessing a single one. The results are satisfactory for my purposes. The single uncompressed file in a dataset has about 15.9KB.

As you can see above access times are quite bearable when you think about 90% savings on your hard drive.

Munin chart with disk usage before and after zipping data (fuck yeah!). Look at /home:

Copyright © 2025 Closer to Code

Theme by Anders NorenUp ↑