CI Archives - Closer to Code

While officially End-of-Life (EOL), Ruby 2.7 remains critical in many ongoing projects. Despite its EOL status, a significant user base continues to rely on this version for various reasons, ranging from legacy system compatibility to specific feature dependencies. As a developer of Karafka, an open-source software (OSS), I recognize the importance of supporting Ruby 2.7, giving users more time to upgrade than the EOL time. This commitment is reflected in my integration tests that ensure compatibility with Ruby 2.7.

However, a recent update has posed a challenge. On Friday, 15th of December 2023, Karafka integration tests for Ruby 2.7 started failing, citing compatibility issues with the latest Bundler version:

ERROR:  Error installing bundler:
The last version of bundler (>= 0) to support your Ruby & RubyGems was 2.4.22.
Try installing it with `gem install bundler -v 2.4.22`
bundler requires Ruby version >= 3.0.0. The current ruby version is 2.7.8.225.

This message indicates that Bundler no longer supports Ruby versions older than 3.0.0, which can be a significant concern for Continuous Integration (CI) processes that still use Ruby 2.7.

In my setup, I utilize a GitHub Actions Ruby versions matrix. This matrix is configured with custom code that always installs the most recent version of Bundler. The integration tests failed as a direct result of this approach, as Bundler versions newer than 2.4.22 are incompatible with Ruby versions older than 3.0.0.

- name: Install latest Bundler
  run: |
    gem install bundler --no-document
    gem update --system --no-document
    bundle config set without 'tools benchmarks docs'

Aside from upgrading to a newer version of Ruby (which is always recommended for long-term support), the remedy involves enforcing the installation and usage of Bundler version 2.4.22 for projects still running on Ruby 2.7. Below is a script that demonstrates how to implement this solution:

if [[ "$(ruby -v | awk '{print $2}')" == 2.7.8* ]]; then
  gem install bundler -v 2.4.22 --no-document
  gem update --system 3.4.22 --no-document
else
  gem install bundler --no-document
  gem update --system --no-document
fi

In this script, a conditional check is performed to determine the Ruby version. If it's 2.7.8, the script installs Bundler version 2.4.22 and the corresponding compatible version of rubygems-update. For newer Ruby versions, the script defaults to installing the latest Bundler and updates the system gems.

The recent incompatibility between the latest Bundler version and Ruby 2.7 highlights a critical aspect of software development: managing dependencies and ensuring compatibility across different versions. While upgrading to the latest Ruby version is the ideal long-term solution, the provided script offers a viable workaround for maintaining projects on Ruby 2.7, ensuring their stability and functionality in CI environments.

After we've added RSpec and Cucumber (with PhantomJS) to our CI build process, it got really, really slow. Due to the application character, after each scenario (for Cucumber) we truncate and restore the whole database. 45 minutes for a single build is definitely not what we aimed to get. So, how to speed up tests execution?

First we thought, that we could run RSpec and Cucumber stuff in parallel (using parallel tests gem). We've got a much better machine on AWS to make sure that a single process has a single core to use. Unfortunately everything got... slower. We've decided to pinpoint a single RSpec spec and a single Cucumber scenario that would be representative and figure out what the hell. What we've discovered at the beginning, is that all the specs were running faster on the Ruby level. It all got significantly slower because of the database. Our tests were heavy in terms of DB communication and as I said before, due to it's character, it will probably stay that way.

So, what were our options?

We could get a much better hardware for our testing DBs. Bigger, faster, with SSD, however it would definitely make things more expensive
We could compromise data consistency. Since it is a testing cluster - in case of a system failure / crash /shutdown, we can just drop all the databases and repopulate them again

We've decided to try out the second approach and use fsync PostgreSQL flag to tweak this database a little bit.

What is fsync (quote from PostgreSQL documentation)?

If this parameter is on, the PostgreSQL server will try to make sure that updates are physically written to disk, by issuing fsync() system calls or various equivalent methods (see wal_sync_method). This ensures that the database cluster can recover to a consistent state after an operating system or hardware crash.

While turning off fsync is often a performance benefit, this can result in unrecoverable data corruption in the event of a power failure or system crash. Thus it is only advisable to turn off fsync if you can easily recreate your entire database from external data.

Examples of safe circumstances for turning off fsync include the initial loading of a new database cluster from a backup file, using a database cluster for processing a batch of data after which the database will be thrown away and recreated, or for a read-only database clone which gets recreated frequently and is not used for failover. High quality hardware alone is not a sufficient justification for turning off fsync.

Results were astonishing! Since we're no longer as much dependent on our HDDs performance for each operation, the database layer does not slow us down that much.

Overall thanks to this tweak and parallel execution, we've managed to get down from 45 minutes for a whole build, down to 12 minutes. That is 75% faster than before and this build time is acceptable for us.

Research done by: Adam Gwozdowski

Tag: CI

The Ruby 2.7 Challenge: Adapting to Bundler’s Latest 2.5+ Update

Speeding up RSpec and Cucumber on the CI server with PostgreSQL fsync flag and parallel execution