I have an application in which I store a lot of data in text files.Recently I've needed to compress this data into datasets and send it to a browser. I've also decided to remove uncompressed data and leave only zipped files. The mayor advantage is HDD consumption - 90% less space needed to store data! However I've encountered a problem. How to retrieve a single file from a zipped collection without unzipping whole collection? Well as always - with Ruby it's quite easy :)

I've created a small wrapper to a Zip Ruby library. It will contain 3 methods:

  1. self.zip - used to compress directory
  2. self.unzip - used to decompress directory
  3. self.open_one - used to retrieve single file content from a compressed directory

First of all, compression...

Zipping directory

require 'rubygems'
require 'zip/zip'
require 'find'
require 'fileutils'

class Zipper

  def self.zip(dir, zip_dir, remove_after = false)
    Zip::ZipFile.open(zip_dir, Zip::ZipFile::CREATE)do |zipfile|
      Find.find(dir) do |path|
        Find.prune if File.basename(path)[0] == ?.
        dest = /#{dir}\/(\w.*)/.match(path)
        # Skip files if they exists
          zipfile.add(dest[1],path) if dest
        rescue Zip::ZipEntryExistsError
    FileUtils.rm_rf(dir) if remove_after


We catch Zip::ZipEntryExistsError exception - so we won't overwrite files in an archive if the file already exist. After all (no exceptions raised) we can remove the source directory:

Zipper.zip('/home/user/directory', '/home/user/compressed.zip')

Unzipping directory

class Zipper

  def self.unzip(zip, unzip_dir, remove_after = false)
    Zip::ZipFile.open(zip) do |zip_file|
      zip_file.each do |f|
        f_path=File.join(unzip_dir, f.name)
        zip_file.extract(f, f_path) unless File.exist?(f_path)
    FileUtils.rm(zip) if remove_after


Usage is similar to the zip method. We provide zip file, directory to unzip and we decide whether or not to remove source file after unzipping its content.

Zipper.unzip('/home/user/compressed.zip','/home/user/directory', true)

Retrieving single file content

class Zipper

  def self.open_one(zip_source, file_name)
    Zip::ZipFile.open(zip_source) do |zip_file|
      zip_file.each do |f|
        next unless "#{f}" == file_name
        return f.get_input_stream.read



Zipper.open_one('/home/user/source.zip', 'subdir_in_zip/file.ext')

If file doesn't exist nil will be returned. This method does not save this file - it will return decompressed content (but won't save it). I use it to serve this content via web-server. What about performance? Well it depends on zipped file size, amount of compressed files in archive and our "target" file size. Below a simple chart showing relationship between the number of files and the speed of accessing a single one. The results are satisfactory for my purposes. The single uncompressed file in a dataset has about 15.9KB.

As you can see above access times are quite bearable when you think about 90% savings on your hard drive.

Munin chart with disk usage before and after zipping data (fuck yeah!). Look at /home: