Running with Ruby

Page 2 of 142

Sharing models between Rails apps – Keeping Rails engine migrations in the engine

Note 1: If you have an option to use micro services and/or event-sourcing, go for it! Rails solutions based on shared models and a single shared database, can bring you in a longer perspective more harm than good.

Note 2: This approach is great when your data is tightly coupled and you can’t easily switch from a single app to a distributed model.

Sharing models between Rails apps – basics

The concept of shared models is really well described by Fabio Akita in following articles:

However, his approach towards migrations comes from his really specific use-case (two DBs in a single app – one shared and one private).

Managing migrations – standard Rails engine approach

Standard Rails engine approach assumes that your migrations will be copied from the engine into the application when you run following command:

bundle exec rake engine_name_engine:install:migrations

This is great when:

  • You have a single “master” application that you want to decompose with engines
  • you have multiple applications with separate databases and you want to use business logic from the engine from each of them
  • If you want to to have a single “master” application that is supposed to run all the migrations from the engine

However with some benefits, you get a huge (in my case) drawback – when you copy migrations, their timestamp is being changed. It means that if you share same database across multiple applications that also share the same engine, you will end up with a single migration being executed (assuming you install the migrations) from each of the separate applications.

Single database and no master application

This won’t do if your case is similar to mine, that is:

  • Single database
  • Multiple applications that need to share same models and scopes
  • Migrations should be executed in the first application that is being deployed after the model engine change (not from the “master” app)
  • There should not be any patching / adding  code into any of the apps that will use shared models gem

Keeping your migrations inside your model engine

Solution for such a case (in which all the models are being kept inside the gem) is pretty simple: you just need to append migrations into your apps migrations path without:

  • Copying them from the model engine gem
  • Changing the timestamp
  • Executing the same migrations multiple times

To achieve such a behavior, we will take advantage of how Rails config paths, migrations and initializers work:

  • Config paths aren’t bound to the Rails.root directory (which means that they can use files from gems and other locations)
  • Config paths are appendable (which means we can add our gem migrations into the app migration list without changing timestamps and copying files)
  • Engine initializer allow us to bind this process from the model gem, keeping the apps untouched (they will think that those migrations are theirs)
  • Rails migrations execution details are stored in schema_migrations table, so unless executed exactly the same moment (so transactions overlap) a single gem migration will not be executed twice

All of this comes down few lines of Ruby inside Rails engine engine class (engine_path/lib/engine_name/engine.rb):

initializer :append_migrations do |app|
  # This prevents migrations from being loaded twice from the inside of the
  # gem itself (dummy test app)
  if app.root.to_s !~ /#{root}/
    config.paths['db/migrate'].expanded.each do |migration_path|
      app.config.paths['db/migrate'] << migration_path
    end
  end
end

TL;DR – Final solution

engine_path/lib/engine_name/engine.rb:

module ModelEngine
  class Engine < ::Rails::Engine
    initializer :append_migrations do |app|
      # This prevents migrations from being loaded twice from the inside of the
      # gem itself (dummy test app)
      if app.root.to_s !~ /#{root}/
        config.paths['db/migrate'].expanded.each do |migration_path|
          app.config.paths['db/migrate'] << migration_path
        end
      end
    end
  end
end

Summary

Most of the time sharing models is bad, but there are some cases app data is really tightly coupled together and exposing API with building microservices around it would mean a huge overhead. For such cases model gem with internal migrations might be a great solution.

Warning: If you decide to go that road, please make sure, that:

  • Your models are stable
  • Your models are slim and without any business logic
  • Your models don’t have any callbacks or external dependencies
  • If your models have external dependencies, make them model gem dependencies
  • Your models are loosely coupled (if you follow Akitas approach with concerns it won’t be hard)
  • Your applications are well tested
  • Your model gem is well tested
  • You don’t use model validations – instead you can use Reform, Dry-Validations or any other solution that allows you to move validations logic out of models
  • All the model related things and migrations are inside the model gem
  • Migrations from external gems like Devise or FriendlyId are also inside the gem

With all of this in mind, you should be fine :-)

Cover photo by: Unsplash on Creative Commons 0 license.

Ruby: Stream processing of shell command results

There are various methods for calling shell commands in Ruby. Most of them either not wait at all for the results, or wait until the command execution is finished.

In many cases it is ok, because programmers want shell command results for further processing. Unfortunately this means that while a shell command runs, there’s no way to get partial results and process them (multitasking FTW). It also means that all the results have to be buffered. It might be (for a long running intensive commands) a source of memory leaks in your applications.

Luckily there’s a great way to process shell command data in a stream (row after row).

The task

Lets assume that we want to find first 10 files in our file system that match a given pattern (note that it could be achieved way better with just shell commands but it’s not the point here).

The bad way

Here’s a typical code to achieve that (and believe me – I’ve encountered solutions like that in production systems):

require 'memory_profiler'
report = MemoryProfiler.report do
  pattern = /test/
  results = `find / 2> /dev/null`.split("\n")
  selection = results.select { |file| file =~ pattern }
  selection[0..10]
end

report.pretty_print

This might seem elegant and it definitely works, but let’s check Ruby’s process memory usage:

Memory usage in MB (before and after find)

Total allocated: 661999052 bytes (2925598 objects)
Total retained:  40 bytes (1 objects)

allocated memory by gem
-----------------------------------
 661999052  other

allocated memory by class
-----------------------------------
 632408588  String
  26181752  Array
   3408440  MatchData
       232  Hash
        40  Process::Status

We had to use nearly 700MB of memory and it took us 4.7 seconds just to find few matching files. The time wouldn’t be that bad,  but memory usage like this is a bit overkill. It is bad mostly because find / lists all the files and the more things we have, the bigger output we get. This also means that our code will behave differently dependent on what machine it will run. For some we might not have memory problems but for others it might grow over 1GB.

Now imagine what would happen if we would execute this code in 25 Sidekiq concurrent workers…

Of course with GC running we might not kill our machine, but memory spikes will look kinda weird and suspicious.

The good way – hello IO.popen

Instead of waiting for all the files from find / command, let’s process each line separately. To do so, we can use the IO.popen method.

IO.popen runs the specified command as a subprocess; the subprocess’s standard input and output will be connected to the returned IO object. (source)

It means that we can execute find command and feed our main process with every line of the output separately.

Note: IO.popen executed without a block will not wait for the subprocess to finish!

require 'memory_profiler'
report = MemoryProfiler.report do
  pattern = /test/
  selection = []

  IO.popen('find / 2> /dev/null') do |io|
    while (line = io.gets) do
      # Note - here you could use break to get out and sigpipe
      # subprocess to finish it early. It will however mean that your subprocess
      # will stop running early and you need to test if it will stop without
      # causing any trouble
      next if selection.size > 10
      selection << line if line =~ pattern
    end
  end

  selection[0..10]
end

report.pretty_print

Results:

Total allocated: 362723119 bytes (2564923 objects)
Total retained:  394 bytes (3 objects)

allocated memory by gem
-----------------------------------
 362723119  other

allocated memory by class
-----------------------------------
 362713215  String
      8432  IO
      1120  MatchData
       232  Hash
        80  Array
        40  Process::Status

45% less memory required.

The best way (for some cases)

There’s also one more way to do the same with the same #popen but in a slightly different style. If you:

  • Don’t need to process all the lines from the executed command
  • Can terminate subprocess early
  • Are aware of how to manage subprocesses

you can then stream data into Ruby as long as you need and terminate once you’re done. Than way Ruby won’t fetch new lines and won’t have to GC them later on.

require 'memory_profiler'
report = MemoryProfiler.report do
  pattern = /test/
  selection = []
  run = true

  io = IO.popen('find / 2> /dev/null')

  while (run && line = io.gets) do
    if selection.size > 10
      Process.kill('TERM', io.pid)
      io.close
      run = false
    end

    selection << line if line =~ pattern
  end

  selection
end

report.pretty_print

Since we don’t wait for the subprocess to finish, it definitely will be faster but what about memory consumption?

Total allocated: 509989 bytes (5613 objects)
Total retained:  448 bytes (4 objects)

allocated memory by gem
-----------------------------------
    509989  other

allocated memory by class
-----------------------------------
    499965  String
      8432  IO
      1120  MatchData
       232  Hash
       200  Array
        40  Process::Status

99% less than the original solution!

Note: This solution is not always applicable.

Summary

The way you execute shell commands really depends on few factors:

  • Do you need an output results at all?
  • Do you need all the lines from the output at the same time?
  • Can you do other stuff and return once the data is ready?
  • Can you process partial data?
  • Can you terminate subprocess early?

When you go with your code out of Ruby scope and when you execute shell commands, it is always good to ask yourself those questions. Sometimes achieving stream processing ability can be done only when the system is being built, so it is really good to think about that before the implementation. In general I would recommend to always consider streaming in every place where we cannot exactly estimate the external command result size. That way you won’t be surprised when there will be a lot more data that initially anticipated.

Note: Attentive readers will notice, that I didn’t benchmark memory used in the subprocess. That is true, however it was irrelevant to our case as the shell command for all the cases was exactly the same.

Cover photo by: heiwa4126 on Creative Commons license.

« Older posts Newer posts »

Copyright © 2017 Running with Ruby

Theme by Anders NorenUp ↑