Using Ruby and Zip library to compress directories and read single file from compressed collection

I have an application in which I store a lot of data in text files.Recently I've needed to compress this data into datasets and send it to a browser. I've also decided to remove uncompressed data and leave only zipped files. The mayor advantage is HDD consumption - 90% less space needed to store data! However I've encountered a problem. How to retrieve a single file from a zipped collection without unzipping whole collection? Well as always - with Ruby it's quite easy :)

I've created a small wrapper to a Zip Ruby library. It will contain 3 methods:

  1. - used to compress directory
  2. self.unzip - used to decompress directory
  3. self.open_one - used to retrieve single file content from a compressed directory

First of all, compression...

Zipping directory

require 'rubygems'
require 'zip/zip'
require 'find'
require 'fileutils'

class Zipper

  def, zip_dir, remove_after = false), Zip::ZipFile::CREATE)do |zipfile|
      Find.find(dir) do |path|
        Find.prune if File.basename(path)[0] == ?.
        dest = /#{dir}\/(\w.*)/.match(path)
        # Skip files if they exists
          zipfile.add(dest[1],path) if dest
        rescue Zip::ZipEntryExistsError
    FileUtils.rm_rf(dir) if remove_after


We catch Zip::ZipEntryExistsError exception - so we won't overwrite files in an archive if the file already exist. After all (no exceptions raised) we can remove the source directory:'/home/user/directory', '/home/user/')

Unzipping directory

class Zipper

  def self.unzip(zip, unzip_dir, remove_after = false) do |zip_file|
      zip_file.each do |f|
        zip_file.extract(f, f_path) unless File.exist?(f_path)
    FileUtils.rm(zip) if remove_after


Usage is similar to the zip method. We provide zip file, directory to unzip and we decide whether or not to remove source file after unzipping its content.

Zipper.unzip('/home/user/','/home/user/directory', true)

Retrieving single file content

class Zipper

  def self.open_one(zip_source, file_name) do |zip_file|
      zip_file.each do |f|
        next unless "#{f}" == file_name



Zipper.open_one('/home/user/', 'subdir_in_zip/file.ext')

If file doesn't exist nil will be returned. This method does not save this file - it will return decompressed content (but won't save it). I use it to serve this content via web-server. What about performance? Well it depends on zipped file size, amount of compressed files in archive and our "target" file size. Below a simple chart showing relationship between the number of files and the speed of accessing a single one. The results are satisfactory for my purposes. The single uncompressed file in a dataset has about 15.9KB.

As you can see above access times are quite bearable when you think about 90% savings on your hard drive.

Munin chart with disk usage before and after zipping data (fuck yeah!). Look at /home:

Categories: Default, Ruby, Software


  1. The ruby zip landscape seems a bit bleak. I had such trouble trying to do streaming zip generation in a rails app that I made a gem for it (—totally seperated from any existing zip library.

    A reusable, modular set of zip features would be awesome.

  2. That’s a creative answer to a difficult quetsoin

  3. Ever thought about making this a gem?

  4. @Mark Thomas – give me 2 weeks and You’ll have a gem :) – it is a great idea! :)

  5. Thanks a lot. This really helped me. I kind of want to send you money haha. Now I just need to figure out how to unzip the directory on unzip it on s3.

Leave a Reply

Your email address will not be published. Required fields are marked *


Copyright © 2024 Closer to Code

Theme by Anders NorenUp ↑