Tag: Gitlab CI

Speeding up RSpec and Cucumber on the CI server with PostgreSQL fsync flag and parallel execution

After we've added RSpec and Cucumber (with PhantomJS) to our CI build process, it got really, really slow. Due to the application character, after each scenario (for Cucumber) we truncate and restore the whole database. 45 minutes for a single build is definitely not what we aimed to get. So, how to speed up tests execution?

First we thought, that we could run RSpec and Cucumber stuff in parallel (using parallel tests gem). We've got a much better machine on AWS to make sure that a single process has a single core to use. Unfortunately everything got... slower. We've decided to pinpoint a single RSpec spec and a single Cucumber scenario that would be representative and figure out what the hell. What we've discovered at the beginning, is that all the specs were running faster on the Ruby level. It all got significantly slower because of the database. Our tests were heavy in terms of DB communication and as I said before, due to it's character, it will probably stay that way.

So, what were our options?

  • We could get a much better hardware for our testing DBs. Bigger, faster, with SSD, however it would definitely make things more expensive
  • We could compromise data consistency. Since it is a testing cluster - in case of a system failure / crash /shutdown, we can just drop all the databases and repopulate them again

We've decided to try out the second approach and use fsync PostgreSQL flag to tweak this database a little bit.

What is fsync (quote from PostgreSQL documentation)?

If this parameter is on, the PostgreSQL server will try to make sure that updates are physically written to disk, by issuing fsync() system calls or various equivalent methods (see wal_sync_method). This ensures that the database cluster can recover to a consistent state after an operating system or hardware crash.

While turning off fsync is often a performance benefit, this can result in unrecoverable data corruption in the event of a power failure or system crash. Thus it is only advisable to turn off fsync if you can easily recreate your entire database from external data.

Examples of safe circumstances for turning off fsync include the initial loading of a new database cluster from a backup file, using a database cluster for processing a batch of data after which the database will be thrown away and recreated, or for a read-only database clone which gets recreated frequently and is not used for failover. High quality hardware alone is not a sufficient justification for turning off fsync.

Results were astonishing! Since we're no longer as much dependent on our HDDs performance for each operation, the database layer does not slow us down that much.

Zrzut ekranu z 2016-03-31 12:16:09

Zrzut ekranu z 2016-03-31 12:44:35

Zrzut ekranu z 2016-03-31 12:20:48

Overall thanks to this tweak and parallel execution, we've managed to get down from 45 minutes for a whole build, down to 12 minutes. That is 75% faster than before and this build time is acceptable for us.

Research done by: Adam Gwozdowski

Gitlab, Ruby private repository, Docker container and safe way to add SSH key for build process only

Note

This is not a tutorial on how to use Docker with SSH keys and private repositories. This is a solution to a particular safety issue when building Docker containers.

I won't get into details on how to create a Dockerfile with a SSH key, etc - there's enough about that in the Internet.

Problem - container stored SSH key

During the Docker build process, we need a SSH key for bundle install. We need it, because we have a Gitlab with private repositories that we want to pull. Since Docker caches this key, it will go to a production environment together with the whole container. It is unsafe. If someone got somehow into the container, he could use this key to get into your repositories.

Solution - during build only, one-time SSH key pair

This issue can be easily solved. To do so, we need to:

  1. Generate SSH key pair
  2. Add it to our Gitlab instance using Gitlab API
  3. Add private key to Docker
  4. Docker b uild (and bundle install)
  5. Remove/revoke our key from Gitlab
  6. Deploy container

There's still going to be a SSH key in the container, but it won't be valid. Since it will be valid only during the build process (which happens on the CI), there's no risk. The key gets revoked before we deploy container.

Code

For Gitlab API communication we will be using Gitlab API gem.

require 'gitlab'
require 'fileutils'

# I assume you have this script in the same place where your Dockerfile

# You can find your private token on the account page of your gitlab user
Gitlab.configure do |config|
  config.endpoint = 'http://your-gitlab/api/v3'
  config.private_token  = 'deploy/docker gitlab user private token'
end

# Add this if you use self signed SSL cert with Gitlab
OpenSSL::SSL::VERIFY_PEER = OpenSSL::SSL::VERIFY_NONE

'ssh-keygen -q -t rsa -f ./id_rsa -N ""'

key = Gitlab.create_ssh_key("docker-#{Time.now.to_s}", File.read('./id_rsa.pub'))

# Note that you should provide some sort of failover to remove key if build fails
system('docker build .')

Gitlab.delete_ssh_key(key.id)

FileUtils.rm('./id_rsa')
FileUtils.rm('./id_rsa.pub')

Copyright © 2024 Closer to Code

Theme by Anders NorenUp ↑