Running with Ruby

Category: Ruby (page 1 of 91)

Karafka (Ruby + Kafka) framework 1.0.0 Release Notes

Note: These release notes cover only the major changes. To learn about various bug fixes and changes, please refer to the change logs or check out the list of commits in the main Karafka repository on GitHub.

It’s been over a year, since the last major Karafka framework release (0.5). During that time, we’ve managed to implement plenty of new features and fix so many bugs, that I don’t know where to start…

Today I’m pleased to announce, that we’re ready with the final 1.0 release.

Code quality

The quality of our work has always been important to us. Few months ago we’ve made a transition from polishgeeks-dev-tools to Coditsu. It allowed us to find and fix several new code offenses and to leverage the quality of the code and documentation. Here are some screens on where we were and where we are now:

There are still some things to be fixed and made better, That said, this is the best release we’ve made not only in terms of features but also in terms of quality of the code and the documentation.

For more details about the quality of the whole Karafka ecosystem, feel free to visit our Karafka Coditsu organization page.

Features

There are more and more companies taking advantage of Karafka as their backend async messaging backbone. Many of the new features were either feature requests or pull requests (including some from Shopify and other big players), that cover both performance and functionality issues existing in Karafka. It’s amazing looking into all the use-cases that people cover with this framework.

Batch processing

Believe it or not, but up until now, Karafka didn’t have batch processing functionality. It had batch messages receiving option, but each of the messages had to be processed separately. At the beginning we wanted to imitate the HTTP world, where (most of the time) a single request would equal a single controller instance logic execution.

It wasn’t the best idea ever. Or maybe it was at the time, but we’ve clearly noticed, that it took away a huge part of possibilities that Kafka brings to the table.

Luckily those days are gone! From now on you can not only receive messages in batches (which makes Karafka several times faster), but you can also process them that way. The only thing you need to do is set the batch_processing config option to true.

You can do this either on an app configuration level:

class App < Karafka::App
  setup do |config|
    config.batch_consuming = true
    config.batch_processing = true
    # Other options
  end
end

or per each topic route you define:

App.routes.draw do
  consumer_group :events_consumer do
    batch_consuming true

    topic :events_created do
      controller EventsCreatedController
      backend :inline
      batch_processing true
    end
  end
end

Once you turn this option on, you will have access to a method called #params_batch that will contain all the messages fetched from Kafka in a single batch.

It’s worth pointing out, that a single messages batch always contains messages from the same topic and the same partition.

class EventsController < ApplicationController
  def perform
    # This example uses https://github.com/zdennis/activerecord-import
    Event.import params_batch.map { |param| param[:event] }
  end
end

Keep in mind, that params_batch is not just a simple array. The messages inside are lazy parsed upon first usage, so you shouldn’t directly flush them into DB.

Note: For more details about processing messages, please visit the Processing messages section of Karafka wiki.

New routing engine and multiple topic consumer groups

Routing engine provides an interface to describe how messages from all the topics should be received and processed.

Karafka routing engine used to be trivial. The only thing you could really do, was defining topics and their options. From now on, there are two modes in which routing API can operate:

  • Karafka 0.6+ consumer group namespaced style (recommended)
  • Karafka 0.5 compatible consumer group per topic style (old style)

With 0.6+ mode, you can define consumer groups subscribed to multiple topics. This will allow you to group topics based on your use-cases and other factors. It also enables overwriting most of the default settings, in case you need to create a per consumer group specific setup (for example to receive data from multiple Kafka clusters).

App.consumer_groups.draw do
  consumer_group :group_name do
    topic :example do
      controller ExampleController
    end

    topic :example2 do
      controller Example2Controller
    end
  end
end

Note: For more details about processing messages, please visit the Routing section of Karafka wiki.

#topic reference on a controller level

There are several changes related to the topic itself. The biggest one, is its assignment to a controller class, not to a controller instance. This may not seem significant, but it is. It means, that you no longer should use same controller for handling multiple topics. You can still use #topic from your controllers instance (no need to do self.class.topic) – it’s just an alias.

The second big change, is the topic owning consumer group that you can reference as well from the topic. This allows you to discover and programmatically access all the routing details you need just by playing with the topic and consumer group objects:

# From the controller instance level
topic.consumer_group.class #=> Karafka::Routing::ConsumerGroup
topic.consumer_group.name #=> 'commit_builds'
topic.name #=> 'commit_builds_scheduled'

# From the console / outside of the controller scope
App.consumer_groups.count #=> 3
App.consumer_groups.first.name #=> 'commit_builds'
App.consumer_groups.first.topics.count #=> 5

#params_batch messages with additional Kafka message details

Each Kafka message you receive, contains now following extra attributes received from Kafka:

  • partition
  • offset
  • key
  • topic

IMHO the most interesting one is the partition key, that can be used when applying ordered changes to any persistent models (key can be used to ensure proper order delivery via Kafka guaranteed partition order feature):

def perform
  params_batch.each do |param|
    User.find(param[:key]).update!(param[:user])
  end
end

#params_batch and #params lazy evaluation

params_batch is not just a simple array. The messages inside are lazy parsed upon first usage, so you shouldn’t directly flush them into DB. To do so, please use the #parsed params batch method to parse all the messages:

class EventsController < ApplicationController
  def perform
    EventStore.store(params_batch.parsed)
  end
end

Parsing will be automatically performed as well, if you decide to map parameters (or use any Enumerable module method):

class EventsController < ApplicationController
  def perform
    EventStore.store(params_batch.map { |param| param[:user] })
  end
end

Karafka does not parse all the messages at once due to performance reasons. There are cases in which you might be interested only in the last message in a batch. It would be useless on such occasions to parse everything there is.

You can use this feature to prefilter unparsed data based on partition, topic or any other non-data related aspects:

def perform
  # In this example, we will ignore non-existing users data
  # without even unparsing their details.
  # Casting to an array will disable the automatic parsing upon iterating,
  # so when we decide to fetch user data, we need to use the #retrieve method
  ids_from_partition_key = params_batch.to_a.map { |param| param[:key] }
  existing_users_ids = User.where(id: ids_from_partition_key).pluck(:id)

  params_batch.to_a.each do |param|
    param[:parsed] #=> false
    next unless existing_users_ids.include?(param[:key])
    # Some heavy parsing happens here
    param.retrieve!
    param[:parsed] #=> true
    User.where(id: param[:key]).update!(param[:user])
  end
end

Long running persistent controllers

Karafka used to create a single controller instance per each received message. This was one of the reasons why it had a quite big memory fingerprint. From now on (if not disabled by the config persistent flag), Karafka will create and use a single object for each topic partition up until its shutdown.

This change not only reduces memory and CPU usage, but also allows to do cross-batch aggregations. One of the use-cases could be normalization of the batch insert process, so the DB flushing is performed only when we reach a certain buffer size:

class EventsController < ApplicationController
  FLUSH_SIZE = 1000

  def perform
    buffer << params_batch.map { |param| param[:event] }
    if buffer.size >= FLUSH_SIZE
      data = buffer.shift(FLUSH_SIZE)
      Event.import(data)
    end
  end

  private

  def buffer
    @buffer ||= []
  end
end

Note: example above is simplified. You probably want to cover flushing buffer also in a case of process shutdown.

Encryption and authentication using SSL and SASL support

Karafka uses ruby-kafka driver to talk with Kafka. Now you can embrace all its encryption and authentication features. All the related configuration options are described here.

Limited consumer groups execution from a single process

One of the biggest downsides of Karafka 0.5 was its lack of ability to do a per consumer group scaling. Each server process was spinning up all the consumer groups from the routing. This was OK for smaller applications, but it was not enough for bigger systems. Karafka 1.0 server allows you to specify which consumer groups you want to run in a given process. This means you can easily scale your infrastructure together with your Kafka traffic.

Given set of consumer groups like this one:

App.consumer_groups.draw do
  consumer_group :events do
    # events related topics definitions
  end

  consumer_group :users do
    # users related topics definitions
  end

  consumer_group :webhooks do
    # webhooks related topics definitions
  end
end

can now run all together:

# Equals to bundle exec karafka server --consumer-groups=events users webhooks
bundle exec karafka server

in separate processes:

bundle exec karafka server --consumer-groups=events --daemon --pid=./pids/karafka0.pid
bundle exec karafka server --consumer-groups=users --daemon --pid=./pids/karafka1.pid
bundle exec karafka server --consumer-groups=webhooks --daemon --pid=./pids/karafka2.pid

or in a mixed mode, where some of the processes run multiple groups:

bundle exec karafka server --consumer-groups=events --daemon --pid=./pids/karafka0.pid
bundle exec karafka server --consumer-groups=users webhooks --daemon --pid=./pids/karafka1.pid

Multi process management thanks to Capistrano-Karafka

In reference to the previous feature, Capistrano-Karafka has been updated as well. It now supports multi-process, multi and single group process deployment flow:

# Exemplary Capistrano deployment Karafka definitions
set :karafka_role, %i[karafka_small karafka_big]

set :karafka_small_processes, 1
set :karafka_small_consumer_groups, %w[
  group_a
]

set :karafka_big_processes, 4
set :karafka_small_consumer_groups, [
  'group_a group_b',
  'group_c group_d',
  'group_e',
  'group_f'
]

server 'example-small.com', roles: %i[karafka_small]
server 'example-big.com', roles: %i[karafka_big]

Processing backends (Inline and Sidekiq)

Karafka is no longer bound to Sidekiq. There are cases in which Sidekiq can be really helpful when processing messages (reentrancy, thread scaling, etc), however for many other it was just a redundancy (receiving from one queue and pushing back to another). The default processing mode for Karafka 1.0 is an :inline mode. It means that processing of messages will happen right after they are fetched from Kafka.

If you want to process your Kafka messages automatically in Sidekiq (without having to worry about workers or anything else), please visit the Karafka-Sidekiq-Backend README.

JRuby support

Thanks to few small changes, Karafka can be executed with JRuby 9000.

Incompatibilities

Moving forward means, that from time to time, you need to introduce some incompatibilities. There were some breaking changes, but the upgrading process shouldn’t be that hard. We will cover it in a different article soon. Here are the most important incompatibilities you might encounter during the upgrade:

  • Default boot file has been renamed from app.rb to karafka.rb
  • Removed worker glass as dependency (now and independent gem – if you use it, you need to add it to your gemfile)
  • kafka.hosts option renamed to kafka.seed_brokers – you don’t need to provide all the hosts to work with Kafka
  • start_from_beginning setting moved into kafka scope (kafka.start_from_beginning)
  • Router no longer checks for route uniqueness – now you can define same routes for multiple kafkas and do a lot of crazy stuff, so it’s your responsibility to check uniqueness
  • Change in the way we identify topics in between Karafka and Sidekiq workers. If you upgrade, please make sure, all the jobs scheduled in Sidekiq are finished before the upgrade.
  • batch_mode renamed to batch_consuming
  • Renamed #params content key to value to better resemble ruby-kafka internal messages naming convention
  • Renamed inline_mode to inline_processing to resemble other settings conventions
  • Renamed inline_processing to backend
  • Single controller needs to be used for a single topic consumption
  • Renamed before_enqueue to after_received to better resemble internal logic, since for inline backend, there is no enqueuing.
  • Due to the level on which topic and controller are related (class level), the dynamic worker selection is no longer available.
  • Renamed params #retrieve to params #retrieve! to better reflect how it works.
  • Sidekiq backend needs to be added as a separate gem (Karafka no longer depends on it)

Wiki updates

We’ve spent long hours to ensure, that our wiki is complete and consistent. We’ve added several new pages, including:

Other changes

Lower memory usage

We’ve managed to reduce number of new allocated objects down by around 70%. Karafka no longer creates so many objects for each received message and message batch as it used to. It also depends on less gems and requires much less additional libraries, so the overall memory consumption is significantly lower.

Better settings management between ruby-kafka and karafka

We’ve reorganized the whole concept of passing settings in betwen Karafka and ruby-kafka to be able to faster adapt if anything changes. The internal API is also much cleaner and easier to understand.

Dry-validation FTW

All internal validations are now powered by dry-validation schemas.

multi_json

In order to support different Ruby implementations, we’ve decided to use multi_json gem, so anyone can pick the most suitable JSON parser he needs.

Getting started with Karafka

If you want to get started with Kafka and Karafka as fast as possible, then the best idea is to just clone our example repository:

git clone https://github.com/karafka/karafka-example-app ./example_app

then, just bundle install all the dependencies:

cd ./example_app
bundle install

and follow the instructions from the example app Wiki.

Domain-Driven Rails – Mediocrity-Driven Book

Updates

  • This is a review of the Domain-Driver book and its content only. Keep in mind, that this book can be bought also with code, exercises and workshops. Those weren’t the subject of this review.
  • This is a review of the beta version of the book. Arkency didn’t mention that on their site, but this has already been fixed. I highly recommend checking their website for updates and/or fixes, as not all the points from this review might still be valid.
  • The author was kind enough to comment this review, so please go to the comments section for his perspective.

Introduction

It’s been a while since my last technical book review. I’ve decided to have a go at Domain-Driven Rails as this subject is one of the things that I’ve been particularly interested in for quite a while now.

For those who don’t know me, I’m the creator of Karafka framework, a tool that can and is used for simplifying Apache Kafka based Ruby applications development. It is used by multiple teams for implementing the Domain-Driven approach (mainly, for the Domain events / Event sourcing parts of DDD)  within their Ruby and Rails applications.

I’ve been using the DDD approach for both: new and existing systems for over 3 years and I can definitely say, that if you want to build complex systems, which can be maintained and developed, this is the way to go.

But is Domain-Driven Rails the best book to read before tossing yourself into deep end? Unfortunately, I’m not quite sure.

Aim of this review

The aim of this review is to help you decide whether or not this book is a good choice for you and is it worth spending $49 or more.

Who is the book target reader?

I would say, that this book is definitely not for the readers who:

  • just finished some Ruby on Rails tutorials,
  • recently mastered basics of CRUD and scaffolding,
  • didn’t encounter any of the problems that this book aims to solve,
  • your biggest applications contain +/-20 models at most and have only simple business logic,
  • you’ve never used Sidekiq or any other background processing tools.

Of course it doesn’t mean that it won’t be useful for you at all. It’s more about giving the problems and challenges described in this title a deeper, personalized context without which it will be just a next theoretical (from the usage perspective) position in your library.

It’s also not a glossary type of book, which you can open, find a solution and close it again. DDD is a huge subject and cherry-picking of some parts as remedies may not be the best idea.

DD Rails can be a good choice if:

  • you’ve spent 2+ years with Rails built systems,
  • you just need a really quick introduction into the subject, without theoretical base or anything else,
  • you’ve just started a new job in a company where they use DDD and you want to get a grasp of it but don’t have too much time.

In such cases, go for it, but before you do, please read the rest of the review.

Advanced programmer reviewing book for mids? Does it even make sense?

You might say, that I’m not the best guy to review this book. You might say, that I’ve been using the approach and solutions presented in it for a long time, so my expectations may not exactly be met by this book. And I think just the opposite, because I remember myself reading Implementing Domain-Driven Design and trying to adapt knowledge found there into the systems built with Ruby and Rails. I remember times, when there were no decent libraries nor articles for Ruby programmers on that matter. Heck, that was one of the reasons Karafka was born!

The review

Below I present some of the things that I liked/disliked in particular. If you are interested in a TL;DR version, please go to the summary section.

What did I like about this book

Undoubtedly, there are some good aspects of this book. Here are a few that I appreciated the most.

The idea

I’m a big evangelist of DDD and its components. I really appreciate the time and the effort Robert & the Arkency team invested to make DDD more accessible and easier for Ruby and Rails developers.

References

If you strive for more information on a given matter, at the end of each chapter you will find tons of references and links for further exploration.

Sagas

Sagas are one of the things I find hard to explain to people*. We are accustomed too much to the HTTP way of thinking.

Sagas are meant to be asynchronous and they need to be designed that way.

Because saga will be asynchronously processing domain events from many bounded contexts, which can happen at any time, there is potential for race conditions

Luckily the guys behind this publication didn’t forget about it. They embrace it and present a few ways of dealing with this kind of situations.

They also cover the at-least-once delivery case, idempotent sagas and events duplication:

If the message queue that you use gives at-least-once delivery guarantees your sagas should be idempotent and handle duplicated events.

Too bad though, that they didn’t mention the exactly-once Kafka’s feature.

* Especially when you don’t call them process managers

Code examples and business context

The book is rich with examples. Many of them are from finance-related systems that Arkency develops and maintains, which adds an extra value to it. Most of them are clear and stripped out of useless parts that aren’t relevant to the discussed subject.

Things I didn’t like about that book

The good part is behind us. Now it’s time for things that in my opinion made this book much worse that it could be.

Rails or Ruby, Ruby or Rails?

The first page of the prologue and already a funny contradiction:

One topic that was relevant here is how to grow as a Rails developer.

and a few sentences later we get this:

If you focus too much on the technology dimension,
you will become too reliant on “The Framework Way”

Shouldn’t this approach be applied to writing books as well? If so, then maybe the book should be called DD for Ruby. I’m not saying, that this book is tightly bound to the Rails framework, but it would be good not to become too reliant on “The Framework Tooling”. If you get knowledge from a framework perspective*, it will always be distorted. More or less, but it will be and you need to remember this while reading the book.

* This applies as well for a programming language layer

Read a book to understand a book

As I’ve mentioned, there is almost no pure theoretical knowledge presented in DDR. This is an advantage and disadvantage at the same time. I couldn’t help feeling like something was missing and that I could understand everything because I already understood the matter. It means, that if you’re new to the subject, you might find yourself googling for not yet explained acronyms or terminology that

we will discuss in a moment.

The lack of glossary at the end is also a bummer. Especially since (as mentioned), there are times when the explanation for a term is not during its first  occurrence.

Oversize images?

I’ve counted a number of pages where there would be less than 50% of text and/or some oversize images. There are 23 pages like that. It means, that more than 12% of the book is useless.

Note: this point might no longer be valid. Please visit the Updates section of this review.

It took me 1 day to read it. For me, +/- 120 pages of content is not enough to call it a book.

A limited perspective of DDD the Arkency way

If you decide to buy this book and use concepts and code snippets presented there, keep in mind, that you might become Arkency-dependent.

They present their libraries, their approach and their solutions. There would not be anything bad about that, if they explored other solutions and tools available.

They also seem to be really agnostic about their technological stack. They recommend using same database for storing events:

(…) I strongly recommend at the beginning of your journey into the world of asynchronicity, eventual consistency and async handlers: a message bus with a queue in the same SQL database that you for your app. It really simplifies error handling on many levels.

But what they forgot to mention is that:

  • You might not be able to deserialize YAML events in other technologies that easily.
  • You cut yourself from a set of flexible tools that will allow you to easily expand and replace Ruby based components with those written with Go, Elixir or any other technology you find suitable for doing certain jobs.
  • You might become narrow-minded about potential solutions and tools you can add to your stack (because they will have to play along with the Arkency gems).
  • Going for an all-in-one DB solution, tightly integrating all aspects of your software with it, makes it much harder to replace a message bus like that with a more generic one.

Note: I wouldn’t mind that at all, if the books description stated, that most of the examples and code snippets are related to their gems and libraries.

Outdated before published

Long time ago, when Rails weren’t that stable in terms of concepts and internal API, I used to say, that a released book about Ruby on Rails is already an outdated book. This sentence was true especially for printed versions that could not be easily updated.

But I’ve never seen a book that would use abandoned and outdated gems. I was quite surprised to notice, that there are almost 2 pages about Virtus, a library that was abandoned by its creator Solnic (great explanation on the reasons can be found here). Everyone that used to use it, knew from 2015, that it’s going to be totally dropped in favor of the dry-rb ecosystem. But hey, probably authors explore those in the book as well. But did they?

I am not super familiar with the dry-rb ecosystem but I am pretty sure that dry-types, dry-struct and dry-validations provide a nice experience in building command objects as well.

Published but incomplete

Let the image speak for itself:

I know, that the Lean approach is getting more and more popular, but should it be applied to books? I don’t think so. 

Note: This has already been explained in the Updates section and in the comments section of this review.

Eventide as a bonus

When I buy a programming book, I expect it to be written by people that deal with the subject on a daily basis. What did I get as a bonus in DD Rails? 11 page long description of Robert’s first Eventide encounter. Really? I don’t mind him playing with that set of libraries. I would even appreciate his point of  view on them in a blog post, but spending 11 pages out of a book just to describe his first impressions on it?

Summary / TL;DR

Note: Please read the Updates section as not all of this TL;DR might be up to date.

Believe me, I’m not happy to say it, but I must. This book is mediocre at best. Its content could fit into 10-15 blog posts. Then I would say, that it is a really good starting point for people not familiar with DDD. But to sell it as a book?

I’m disappointed with what I’ve read. I know, that Arkency can do way better. I know, that they have 100% guarantee policy but even then, it doesn’t justify selling a product that is:

  • incomplete (refers to the beta version, please read the Updates section),
  • outdated,
  • without proofreading,
  • too short,
  • badly structured in terms of layout,
  • presenting only a single set of libraries and a single perspective on the subject.

There’s a huge difference between expectations against blog posts and books and I think, sometimes it’s worth reminding some people about it.

Olderposts

Copyright © 2017 Running with Ruby

Theme by Anders NorenUp ↑