Tag: gzip

Using Ruby and Zip library to compress directories and read single file from compressed collection

I have an application in which I store a lot of data in text files.Recently I've needed to compress this data into datasets and send it to a browser. I've also decided to remove uncompressed data and leave only zipped files. The mayor advantage is HDD consumption - 90% less space needed to store data! However I've encountered a problem. How to retrieve a single file from a zipped collection without unzipping whole collection? Well as always - with Ruby it's quite easy :)

I've created a small wrapper to a Zip Ruby library. It will contain 3 methods:

  1. self.zip - used to compress directory
  2. self.unzip - used to decompress directory
  3. self.open_one - used to retrieve single file content from a compressed directory

First of all, compression...

Zipping directory

require 'rubygems'
require 'zip/zip'
require 'find'
require 'fileutils'

class Zipper

  def self.zip(dir, zip_dir, remove_after = false)
    Zip::ZipFile.open(zip_dir, Zip::ZipFile::CREATE)do |zipfile|
      Find.find(dir) do |path|
        Find.prune if File.basename(path)[0] == ?.
        dest = /#{dir}\/(\w.*)/.match(path)
        # Skip files if they exists
          zipfile.add(dest[1],path) if dest
        rescue Zip::ZipEntryExistsError
    FileUtils.rm_rf(dir) if remove_after


We catch Zip::ZipEntryExistsError exception - so we won't overwrite files in an archive if the file already exist. After all (no exceptions raised) we can remove the source directory:

Zipper.zip('/home/user/directory', '/home/user/compressed.zip')

Unzipping directory

class Zipper

  def self.unzip(zip, unzip_dir, remove_after = false)
    Zip::ZipFile.open(zip) do |zip_file|
      zip_file.each do |f|
        f_path=File.join(unzip_dir, f.name)
        zip_file.extract(f, f_path) unless File.exist?(f_path)
    FileUtils.rm(zip) if remove_after


Usage is similar to the zip method. We provide zip file, directory to unzip and we decide whether or not to remove source file after unzipping its content.

Zipper.unzip('/home/user/compressed.zip','/home/user/directory', true)

Retrieving single file content

class Zipper

  def self.open_one(zip_source, file_name)
    Zip::ZipFile.open(zip_source) do |zip_file|
      zip_file.each do |f|
        next unless "#{f}" == file_name
        return f.get_input_stream.read



Zipper.open_one('/home/user/source.zip', 'subdir_in_zip/file.ext')

If file doesn't exist nil will be returned. This method does not save this file - it will return decompressed content (but won't save it). I use it to serve this content via web-server. What about performance? Well it depends on zipped file size, amount of compressed files in archive and our "target" file size. Below a simple chart showing relationship between the number of files and the speed of accessing a single one. The results are satisfactory for my purposes. The single uncompressed file in a dataset has about 15.9KB.

As you can see above access times are quite bearable when you think about 90% savings on your hard drive.

Munin chart with disk usage before and after zipping data (fuck yeah!). Look at /home:

Rails + Passenger vs htaccess, cache i gzip (mod_deflate)

Dzisiaj będzie krótko i treściwie. Co chcemy osiągnąć? Chcemy:

  • Serwować statyczny cache z subkatalogu /cache katalogu /public (czyli /public/cache)
  • Kompresować gzipem pliki CSS i JS


Tworzymy plik .htaccess w katalogu /public naszego projektu a następnie wklejamy do niego to:

RewriteEngine On

RewriteCond %{REQUEST_URI} ^([^.]+)/$
RewriteRule ^[^.]+/$ /%1 [QSA,L]

RewriteCond %{THE_REQUEST} ^(GET|HEAD)
RewriteCond %{REQUEST_URI} ^([^.]+)$
RewriteCond %{DOCUMENT_ROOT}/cache/%1.html -f
RewriteRule ^[^.]+$ /cache/%1.html [QSA,L]

RewriteCond %{THE_REQUEST} ^(GET|HEAD)
RewriteCond %{DOCUMENT_ROOT}/cache/index.html -f
RewriteRule ^$ /cache/index.html [QSA,L]

AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE text/javascript
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE application/rss+xml
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE application/x-javascript
BrowserMatch ^Mozilla/4 gzip-only-text/html
BrowserMatch ^Mozilla/4\.0[678] no-gzip
BrowserMatch \bMSI[E] !no-gzip !gzip-only-text/html
SetEnvIfNoCase Request_URI \.(?:gif|jpe?g|png)$ no-gzip dont-vary
SetEnvIfNoCase Request_URI \.(?:exe|t?gz|zip|bz2|sit|rar)$ no-gzip dont-vary
SetEnvIfNoCase Request_URI \.(?:pdf|doc)$ no-gzip dont-vary
SetEnvIfNoCase Request_URI \.(?:avi|mov|mp3|mp4|rm)$ no-gzip dont-vary

Tyle - od teraz nasze aplikacje mają zarówno cache z subkatalogu public/ jak i kompresję gzipem - dzięki czemu strona działa szybciej i zjada mniej łącza.

Copyright © 2022 Closer to Code

Theme by Anders NorenUp ↑