Running with Ruby

Tag: Ruby (page 1 of 74)

Dry-Validation as a schema validation layer for Ruby on Rails API

Legacy code is never the way we would like it to be

There are days, when you don’t get to write your new shiny application using Grape and JSON-API. Life would be much easier, if we could always start from the beginning. Unfortunately we can’t and one of the things that make some developers more valuable that others is their ability to adapt. The app might be outdated, it might have abandoned gems, but if you’re able to introduce new concepts to it (without breaking it and having to rewrite everything), you should be able to overcome any other difficulties.

ActiveRecord::Validations is not the way to go

If you use Rails, then probably you use ActiveRecord::Validations as the main validation layer. Model validations weren’t designed as a protection layer for ensuring incoming data schema consistency (and there are many more reasons why you should stop using them).

External world and its data won’t always resemble your internal application structure. Instead of trying to adapt the code-base and fit it into the world, try making the world as much constant and predictable as it can be.

One of the best things in that matter is to ensure that any endpoint input data is as strict as it can be. You can do quite a lot, before it gets to your ActiveRecord models or hits any business logic.

Dry-Validation – different approach to solve different problems

Unlike other, well known, validation solutions in Ruby, dry-validation takes a different approach and focuses a lot on explicitness, clarity and precision of validation logic. It is designed to work with any data input, whether it’s a simple hash, an array or a complex object with deeply nested data.

Dry-validation is a library that can help you protect your typical Rails API endpoints better. You will never control what you will receive, but you can always make sure that it won’t reach your business.

Here’s a basic example of how it works (assuming we validate a single hash):

BeaconSchema = Dry::Validation.Schema do
  UUID_REGEXP = /\A([a-z|0-9]){32}\z/
  PROXIMITY_ZONES = %w(
    immediate
    near
    far
    unknown
  )

  required(:uuid).filled(:str?, size?: 32, format?: UUID_REGEXP)
  required(:rssi).filled(:int?, lteq?: -26, gteq?: -100)
  required(:major).filled(:int?, gteq?: 0, lteq?: 65535)
  required(:minor).filled(:int?, gteq?: 0, lteq?: 65535)
  required(:proximity_zone).filled(:str?, included_in?: PROXIMITY_ZONES)
  required(:distance) { filled? & ( int? | float? ) & gteq?(0) & lteq?(100)  }
end

data = {}
validation_result = BeaconSchema.call(data)
validation_result.errors #=> {:uuid=>["is missing"], :rssi=>["is missing"], ... }

The validation result object is really similar to the validation result of ActiveRecord::Validation in terms of usability.

Dry Ruby on Rails API

Integrating dry-validation with your Ruby on Rails API endpoints can give you multiple benefits, including:

  • No more strong parameters (they will become useless when you cover all the endpoints with dry-validation)
  • Schemas testing in isolation (not in the controller request – response flow)
  • Model validations that focus only on the core details (without the “this data is from form and this from the api” dumb cases)
  • Less coupling in your business logic
  • Safer extending and replacing of endpoints
  • More DRY
  • Nested structures validation

Non-obstructive data schema for a controller layer

There are several approaches you can take when implementing schema validation for your API endpoints. One that I find useful especially in legacy projects is the #before_action approach. Regardless whether #before_action is (or is not) an anti-pattern, in this case, it can be used to provide non-obstructive schema and data validation that does not interact with other layers (like services, models, etc.).

To implement such protection, you only need a couple lines of code):

class Api::BaseController < ApplicationController
  # Make sure that request data meets schema requirements
  before_action :ensure_schema!

  # Accessor for validation results hash
  attr_reader :validation_result
  # Schema assigned on a class level
  class_attribute :schema

  private

  def ensure_schema!
    @validation_result = self.class.schema.call(params)
    return if @validation_result.success?

    render json: @validation_result.errors.to_json, status: :unprocessable_entity
  end
end

and in your final API endpoint that inherits from the Api::BaseController:

class Api::BeaconsController < Api::BaseController
  self.schema = BeaconSchema

  def create
    # Of course you can still have validations on a beacon level
    Beacon.create!(validation_result.to_h)
    head :no_content
  end
end

Summary

Dry-validation is one of my “must use” gems when working with any type of incoming data (even when it comes from my other systems). As presented above, it can be useful in many situations and can help reduce the number of responsibilities imposed on models.

This type of validation layer can also serve as a great starting point for any bigger refactoring process. Ensuring incoming data and schema consistency without having to refer to any business models allows to block the API without blocking the business and models behind it.

Getting ready for new concurrency in Ruby 3 with Guilds

Ruby Guilds are the new way concurrency will be handled in Ruby 3. There’s still a long way to go until we reach that point, but I believe that we can already start implementing some of the concepts that will make our lives easier when we reach  Ruby 3.

Note: this is not an article explaining what are Guilds and how do they work. You can read excellent explanations on that here:

Note 2: everything here is based on some assumptions which means that in few years the concept might look totally different. However it doesn’t mean that you shouldn’t use good practices or some recommendations that I describe below.

Guilds basic concepts in a TL;DR version

  • Guilds have at least one thread (and a thread has a fiber)
  • Threads in different guilds can run in parallel
  • Threads in a same guild can not run in parallel because of GIL/GVL/GGL (Giant Guild Lock)
  • A guild can’t access the objects of other guilds
  • Guilds are allowed to communicate with each other using channels (Guild::Channel)
  • Objects can be copied between guilds (deep copy)
  • Objects can be moved between guilds
  • Immutable objects (deeply immutable) can be shared between guilds

It sounds simple (and Koichi said that the initial concept implementation has only 400 lines), but it creates many problems that will have to be solved. I will try to cover some of them as they might have an impact on the overall performance of our code.

Don’t try to unlearn locking and multi threading Ruby 2 approach

TL;DR: you will still have to know how to deal with multi threading and it’s problems the way it is handled in Ruby 2.

GVL is insufficient to guard against data races on Ruby2 and this won’t change inside single guild with multiple threads. Since Ruby core team aims to make Ruby 3 compatible with Ruby 2 software (so the community won’t split with incompatible Ruby versions), any Ruby 2 software will run in Ruby 3 in a single Guild. So all your synchronization and locking problems won’t go away without an effort.

Scaling with guilds won’t be linear so don’t think it will solve all your problems

Guilds won’t be silver bullets. They will give Ruby programmers a new, great set of tools but they will for sure create some problems. If you hope that memory usage will drastically go down and that performance will go up, without you doing anything you might be really disappointed when new Ruby appears.

Objects owners mean more checking

TL;DR: the less you share the smaller the transferring overhead will be.

Objects will have guild owners. It means that Ruby will have to have references to which guild an object belongs. One of the slides from Koichi’s presentation states that an exception will be thrown when trying to access object from other guild. It means that Ruby will have to have some sort of checker that will run either on:

  • every object access
  • every object access for objects that were transfered at least once (flag or something?)
  • every object access of an object that is not frozen and references in other guilds

Either way, there will be way more access checking. Ruby already checks the class of each object on it’s access, so maybe this could be combined together.

What that means for us? The less objects we will share between guilds the faster they will run.

Transferring ownership – moving references vs copying

TL;DR: It might be faster (and for sure safer) to start using immutable structures if you plan to transfer a lot of data in between guilds.

Programmers will definitely want to transfer ownership not only for simple objects but also for more complex (and big) data structures. It means that Ruby not only will have to move main objects but all sub-objects (arrays of arrays of objects, etc).

And here a question emerges: wouldn’t it be better to just copy the whole structure instead of updating all the references? Is there even a programming language that has a GC and allows moving mutable objects directly (without deep copying) between threads?

Method cache will remain global

TL;DR: OpenStruct will be a worse idea than it is right now. Try even harder not to invalidate method cache too much.

When we redefine a method (or add new), method lookup will have to be performed an cache invalidation needs to occur on all the guilds. This means that we will have to stop execution of all the guilds at once (since there shouldn’t be a case when one runs on an old method version and another already uses new one).

If you wonder what OpenStruct does to your Ruby code and what impact exactly it has, you can read my article about that here: http://mensfeld.pl/2015/04/ruby-global-method-cache-invalidation-impact-on-a-single-and-multithreaded-applications/.

Global data

TL;DR: Freeze all the global data you define, stop using global variables, don’t redefine stuff unless you really need to and don’t overwrite constants.

By global data I mean:

  • Global variables
  • Class and module objects
  • Class variables
  • Constants
  • Instance variables of class and module objects

Operations that redefine things will be either impossible or slower. It will impact also access (mutable constants access between guilds).

Summary

There isn’t much to summarize since for each section there’s a TL;DR but it’s worth pointing out that we need to be more cautions about sharing our data and about doing a lot of meta-programming beyond good practices (like redefining dynamically built constants) and everything should be fine.

Cover photo by: Shuets Udono on Creative Commons license

Olderposts

Copyright © 2017 Running with Ruby

Theme by Anders NorenUp ↑