Running with Ruby

Author: Maciej Mensfeld (page 2 of 143)

Sidekiq Unique Jobs: don’t waste your time waiting – reschedule if busy

Half a year ago, I’ve made a post about Enforcing unique jobs in Karafka and Sidekiq for single resources. This approach is great, however, there’s a particular case in which Sidekiq Unique Jobs can block all of your Sidekiq workers except one. This can significantly limit your computing power without you being aware of it.

When using Sidekiq Unique Jobs with a WhileExecuting strategy, only a single worker can start processing Sidekiq job with a given unique key. This is really helpful when you work with resources for which you cannot perform parallel operations (for example when you work with a database for which there are no atomic operations but you need to increment counters), as doing so could overwrite results from an other worker. A WhileExecuting strategy, with a properly defined unique key can help you prevent that from happening. However…

Problem definition

Sidekiq jobs are being consumed out of a FIFO queue. Without any additional modifications, situation is pretty clear: having 4 single threaded processes each allows you to process 4 jobs at the same time.

Everything changes, when you decide to add Sidekiq Unique Jobs to your stack. In a case, when there are multiple jobs in sequence for a given unique key, Sidekiq will get seriously clogged. In the worst scenario, it won’t matter how many workers and threads you have in your Sidekiq infrastructure. You will get a performance of a single Sidekiq thread. It is because you cannot make Sidekiq skip certain tasks because of FIFO queue processing. To bypass this limitation, authors of Sidekiq Unique Jobs introduced a #sleep that will run up until the resource is free again to be processed or until timeout occurs. This approach means, that if you have more tasks in queue than processors, they will have to wait until all the jobs with a given unique key are processed.

All the workers will actively wait (meaning that in Sidekiq console you will see them marked as busy) up until a lock is released.

Solution: reschedule instead of waiting

Warning: if a similar case occurs in your business logic quite often, probably you will be better taking engine different than Sidekiq. I would recommend this solution for non-frequent edge/corner cases.

Bypassing that behavior is pretty easy: if there’s a lock, put the current job at the end of the queue. That said, your jobs will be checked for possibility of execution and rescheduled back instead of waiting. This will mess up your queue counters a bit (as you will have more jobs enqueued and processed that it should) but on the other hand it means that Sidekiq will “actively” seek for resources on which it can work in a certain moment.

To do so, we can create a new Unique Job strategy that we can later on apply. Apart from rescheduling, our strategy won’t differ from the WhileExecuting, so we can use it as a base.

module SidekiqUniqueJobs
  module Lock
    class WhileExecutingReschedule < WhileExecuting
      MUTEX = Mutex.new

      def synchronize
        MUTEX.lock

        if (@locked = locked?)
          yield
        else
          # We use sleep just to prevent from a pointless, extremely fast looping
          # in case all the jobs have the same unique key
          sleep 0.1
          @item['class'].constantize.perform_async(*@item['args'])
        end
      rescue Sidekiq::Shutdown
        logger.fatal { "the unique_key: #{@unique_digest} needs to be unlocked manually" }
        raise
      ensure
        # If we were able to obtain lock, we need to release it after processing
        if @locked
          SidekiqUniqueJobs.connection(@redis_pool) { |conn| conn.del @unique_digest }
        end

        MUTEX.unlock
      end
    end
  end
end

Applying this strategy is super easy. We just need to replace while_executing strategy with while_executing_reschedule:

class ApplicationWorker
  include Sidekiq::Worker

  # sidekiq_options unique: :while_executing
  sidekiq_options unique: :while_executing_reschedule
end

Cover photo by: Alexandre Duret-Lutz on Creative Commons 2.0 license. Changes made: added an overlay layer with an article title on top of the original picture.

Dry-Validation as a schema validation layer for Ruby on Rails API

Legacy code is never the way we would like it to be

There are days, when you don’t get to write your new shiny application using Grape and JSON-API. Life would be much easier, if we could always start from the beginning. Unfortunately we can’t and one of the things that make some developers more valuable that others is their ability to adapt. The app might be outdated, it might have abandoned gems, but if you’re able to introduce new concepts to it (without breaking it and having to rewrite everything), you should be able to overcome any other difficulties.

ActiveRecord::Validations is not the way to go

If you use Rails, then probably you use ActiveRecord::Validations as the main validation layer. Model validations weren’t designed as a protection layer for ensuring incoming data schema consistency (and there are many more reasons why you should stop using them).

External world and its data won’t always resemble your internal application structure. Instead of trying to adapt the code-base and fit it into the world, try making the world as much constant and predictable as it can be.

One of the best things in that matter is to ensure that any endpoint input data is as strict as it can be. You can do quite a lot, before it gets to your ActiveRecord models or hits any business logic.

Dry-Validation – different approach to solve different problems

Unlike other, well known, validation solutions in Ruby, dry-validation takes a different approach and focuses a lot on explicitness, clarity and precision of validation logic. It is designed to work with any data input, whether it’s a simple hash, an array or a complex object with deeply nested data.

Dry-validation is a library that can help you protect your typical Rails API endpoints better. You will never control what you will receive, but you can always make sure that it won’t reach your business.

Here’s a basic example of how it works (assuming we validate a single hash):

BeaconSchema = Dry::Validation.Schema do
  UUID_REGEXP = /\A([a-z|0-9]){32}\z/
  PROXIMITY_ZONES = %w(
    immediate
    near
    far
    unknown
  )

  required(:uuid).filled(:str?, size?: 32, format?: UUID_REGEXP)
  required(:rssi).filled(:int?, lteq?: -26, gteq?: -100)
  required(:major).filled(:int?, gteq?: 0, lteq?: 65535)
  required(:minor).filled(:int?, gteq?: 0, lteq?: 65535)
  required(:proximity_zone).filled(:str?, included_in?: PROXIMITY_ZONES)
  required(:distance) { filled? & ( int? | float? ) & gteq?(0) & lteq?(100)  }
end

data = {}
validation_result = BeaconSchema.call(data)
validation_result.errors #=> {:uuid=>["is missing"], :rssi=>["is missing"], ... }

The validation result object is really similar to the validation result of ActiveRecord::Validation in terms of usability.

Dry Ruby on Rails API

Integrating dry-validation with your Ruby on Rails API endpoints can give you multiple benefits, including:

  • No more strong parameters (they will become useless when you cover all the endpoints with dry-validation)
  • Schemas testing in isolation (not in the controller request – response flow)
  • Model validations that focus only on the core details (without the “this data is from form and this from the api” dumb cases)
  • Less coupling in your business logic
  • Safer extending and replacing of endpoints
  • More DRY
  • Nested structures validation

Non-obstructive data schema for a controller layer

There are several approaches you can take when implementing schema validation for your API endpoints. One that I find useful especially in legacy projects is the #before_action approach. Regardless whether #before_action is (or is not) an anti-pattern, in this case, it can be used to provide non-obstructive schema and data validation that does not interact with other layers (like services, models, etc.).

To implement such protection, you only need a couple lines of code):

class Api::BaseController < ApplicationController
  # Make sure that request data meets schema requirements
  before_action :ensure_schema!

  # Accessor for validation results hash
  attr_reader :validation_result
  # Schema assigned on a class level
  class_attribute :schema

  private

  def ensure_schema!
    @validation_result = self.class.schema.call(params)
    return if @validation_result.success?

    render json: @validation_result.errors.to_json, status: :unprocessable_entity
  end
end

and in your final API endpoint that inherits from the Api::BaseController:

class Api::BeaconsController < Api::BaseController
  self.schema = BeaconSchema

  def create
    # Of course you can still have validations on a beacon level
    Beacon.create!(validation_result.to_h)
    head :no_content
  end
end

Summary

Dry-validation is one of my “must use” gems when working with any type of incoming data (even when it comes from my other systems). As presented above, it can be useful in many situations and can help reduce the number of responsibilities imposed on models.

This type of validation layer can also serve as a great starting point for any bigger refactoring process. Ensuring incoming data and schema consistency without having to refer to any business models allows to block the API without blocking the business and models behind it.

Cover photo by: brlnpics123 on Creative Commons 2.0 license. Changes made: added an overlay layer with an article title on top of the original picture.

Olderposts Newerposts

Copyright © 2017 Running with Ruby

Theme by Anders NorenUp ↑