Running with Ruby

Tag: Rails (page 1 of 60)

Kafka on Rails: Using Kafka with Ruby on Rails – Part 1 – Kafka basics and its advantages

Introduction

In this series of articles, I will try to provide you with an explanation on why you should invest your time in learning Kafka and the Karafka framework and how it can reshape the way you design and develop your Ruby applications. I will also try to answer some of the most common questions regarding those two and give you some real usage examples on how you can benefit fast from adding them to your technological stack.

What is Kafka?

Let me quote Wiki on that one:

Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.

Now let’s translate it into some general concepts (copied from here):

  1. It lets you publish and subscribe to streams of records. In this respect, it is similar to a message queue or enterprise messaging system.
  2. It lets you store streams of records in a fault-tolerant way.
  3. It lets you process streams of records as they occur.
  4. It lets you build real-time data pipeline based applications that reliably get data between systems and/or applications.
  5. It lets you build real-time streaming applications that transform and react to a stream of data and/or events.
  6. It allows you to simplify Domain Driver Design implementation within both new and existing applications and allows you to do this more technology agnostic.

Why should I be interested in it?

Because it allows you to expand. And I don’t only mean that you will get much better performance with it and that you will be able to process more and faster.

What I really mean, is that once you understand concepts behind it, you will get a whole new set of possibilities to work with your data. You will expand your horizons and re-shift the way you design your code.

Systems that we build are data-driven and by having more ways of working with it, we get a totally new set of tools and solutions which we can use to make our work better and more efficient.

I keep saying, that the Ruby (and Rails in particular) community lacks architects and good architecture for post-MVP systems. One of the reasons why it is the way it is, is because we’re to bound to the Request-Response way of thinking. Once you learn, that things can be done in a different way, it will impact your way of working with any technology you use, including Ruby on Rails.

Basic Kafka terminology

There are many general Kafka introduction articles, including the official one. Here, I will describe the most important parts of Kafka ecosystem, so you can start working with it as fast as possible.

Note: the description mentioned below might not be 100% accurate, but it should be enough for you to grasp the basics and keep you going.

Note: You can find more details about Kafka in a great Kafka in a Nutshell article.

General publish-subscribe messaging system concept

A messaging system lets you send messages between processes, applications, and servers. Applications should be able to connect to a system like that and transfer messages both ways.

Note: Publisher (one that sends a message) can be a receiver / subscriber at the same time.

Illustration are taken from here.

Kafka brokers

Kafka is a distributed system that runs in a cluster. Each node in the cluster is called a Kafka broker. Broker is a single Kafka process that operates in a cluster.

Kafka topics with partitions

Kafka topic is just a named stream of records. It is a bit similar to Sidekiq or RabbiMQ queue concept. In general, it is a namespace where you are going to store messages that are similar to each other in terms of your business logic.

Everything is organized around topics and most Kafka guarantees are either for a topic or a topic partition. You send and receive messages from topics. Topics in Kafka are always multi-subscriber in nature; that is, a topic can have zero, one, or many consumers that subscribe to the data written to it.

Each Kafka topic is always divided in partitions. Even if you have a single partition, it is still there. Each partition is an ordered, immutable sequence of records that is continually appended to a structured commit log. The records in the partitions are each assigned a sequential id number called the offset.

You can fetch data from multiple partitions with a single consumer, but you need to be aware that their guaranteed delivery order will be maintained within data set from a single partition. It means, that you should not rely on a multi-partition message order within your business logic.

Kafka producers

Kafka producer is an application or a process that sends messages to Kafka.

Kafka consumers and consumer groups

Kafka consumer is an application that reads messages from Kafka.

Consumer can start reading messages from any offset. It means, that you can build systems that will start from the beginning of a topic and replay all the events/messages that Kafka contains or that will start from the current position and only work with new messages that are coming in.

Most of the time, for the first consumer run, you will pick one of those and later on you will always consume from the last offset you worked with before shutting down the consumer, but it is still good to know, that you can always start from any offset you want. This allows consumers to join the cluster at any point in time.

Consumers can be organized in groups. Consumer group includes consumers that subscribe to the same topics. Each consumer in a group will be assigned by Kafka with a set of partitions to work with. This approach allows you to greatly scale as you can increase number of partitions and spin up more consumers within the same consumer group. Kafka guarantees that a message is only read by a single consumer in the group.

You can have more consumers than partitions, but they won’t actively participate in the consumption process. They will start performing work in the case of crashes or other failures of other consumers.

It’s worth pointing out, that Kafka never pushes messages to the consumers on its own. It’s the consumer that asks for messages when it is ready to handle them. This approach is super flexible, as it allows you to temporarily shut down the consumer and after it is back, it will catch up with all the messages that were not yet processed. A really great feature for SOA-based microservices that won’t loose any data. In the worst case scenario, they will just process them a bit later.

What Kafka can do for me and my Ruby on Rails applications?

Note: We will explore all those benefits in details in next parts of this series. Here’s just a quick summary.

A lot. And it really depends on your perspective and your role in the organization. Having Kafka as your messages backbone for Ruby and Rails systems will bring you benefits in many places.

Performance

Most of the Ruby on Rails systems are developed with objects in mind. This is true for both the client end-to-end requests as well as for the Sidekiq background jobs.

Having to refresh or recalculate some things in the system upon a change that is frequent during spikes that occur from time to time? Redesigning this part of the system and being able to fetch messages in batches can lower the need of constant recalculation significantly.

The Kafka-based systems also scale really, really well and due to the multi-consumer subscription model, you can optimize and scale separate parts of the system independently.

Architecture

This is by far the biggest advantage you will get in your Ruby and Rails systems when you add Kafka to them. You will be able to design, build and test independent components that can do things outside of a typical Rails “HTTP like” processing scope.

You won’t have to worry about (almost) anything else except your bounded context and your business domain. Due to the way Kafka works, sometimes you will be even forced to use tools and solutions that aren’t from the “Rails way”.

Have you ever been able to build a proof of concept application that could hook up in real time to staging or production without introducing side-effects? Were you able to run it from your local machine and see how things work? With Kafka, it can be super easy to achieve that.

Note: Don’t get me wrong, it’s not Kafka itself in your stack, that will auto-magically change everything. It’s you having it and understanding what you can achieve with it who will trigger and lead the change. Kafka will just allow you to do those things easily and fast.

Deployment process

Being able to re-consume and re-process messages allows you to shutdown certain parts of the system without affecting others. Since the Kafka messages are not being pushed, they don’t disappear, if not consumed immediately. With a bit of good architecture, you can deploy, perform maintenance and do other things while the system is running without users knowing about that.

Development performance

The bigger system gets, the more often developers step on each other’s toes. Development costs and developers frustration will grow exponentially when they:

  • change the same things simultaneously,
  • have to remember about edge cases out of our current business domain scope,
  • have to deal with additional callback actions and/or non explicit processes.

Kafka allows you to easily use DDD to build systems that are event-based and that can be managed and developed with much smaller overhead than a typical Ruby on Rails MVC, callback-based system.

Freedom of choice

Ruby on Rails can be a burden from time to time. Plain Ruby can do really well. ActiveRecord can be replaced with ROM and Dry-Validation, bringing you many benefits. However, it can be really hard to introduce new concepts in a huge legacy system. If you have Kafka and Karafka, you can spin up a new experimental applications that will perform some business within a bounded context and won’t do any harm to the existing logic and/or data.

Tired of Ruby in general? Replace a single Kafka based component with a different one in a different technology that might better suit your needs.

I already have a message bus (Redis + Sidekiq)

Kafka is not a message bus. It is a distributed streaming platform.

It’s not entirely accurate to compare them as they are not the same. There are many business cases that could be solved with any of those. However, there are some significant differences,when looking from the Sidekiq perspective, that it’s good to know and understand:

  1. Kafka does not handle reentrancy – in case of a message processing failure, it is up to you to decide to do with it. It won’t be pushed back and retried automatically,
  2. Kafka does not support pushing the same message into a queue again (you can push it back but it will be a new message in the partition). Messages are immutable and once placed in Kafka, they cannot be changed,
  3. Sidekiq does not support  message broadcasting and is more command-oriented than event-oriented (do-this vs did-this), especially within Ruby on Rails and Sidekiq scope,
  4. Sidekiq does not support batch consuming,
  5. Kafka can keep events much (configurably) longer due to persistence,
  6. Kafka events can be consumed multiple times by multiple consumer groups,
  7. Kafka can be the only message bus for any publish-subscribe flows,
  8. Sidekiq message that got consumed is being removed from the queue, which means that you cannot re-consume it if needed.

In some situations, it is really good to have them work together, that’s why there’s even a Karafka Sidekiq backend for processing Kafka messages inside of Sidekiq workers. We will get to that in the next parts of this series.

Summary – Karafka as a Ruby Kafka backbone

All this introduction has had one goal: to make you familiar with the basic concepts and advantages of using Kafka with your new and existing Ruby and Rails based systems.

In the next parts of this series, we will explore Karafka, a framework used to simplify Apache Kafka based Ruby applications development.

We will start from building small applications that use Karafka as an internal and external message backbone, and then we’ll move to integrating Karafka with existing monoliths and using it to decompose and re-design your existing code base.

Somewhere down the road, in this series, I will also introduce other “non-Rails” stack tools including Traiblazer, Dry-Validation, ROM and few others, to give you a wider perspective on how much you can benefit, when combining proper tools altogether.

Karafka provides you with a lot of possibilities and you will see for yourself, that when boosted with other great tools, your code quality, architecture, performance and the way you work can jump to a totally different level.

Stay tuned :-)

Domain-Driven Rails – Mediocrity-Driven Book

Updates

  • This is a review of the Domain-Driver book and its content only. Keep in mind, that this book can be bought also with code, exercises and workshops. Those weren’t the subject of this review.
  • This is a review of the beta version of the book. Arkency didn’t mention that on their site, but this has already been fixed. I highly recommend checking their website for updates and/or fixes, as not all the points from this review might still be valid.
  • The author was kind enough to comment this review, so please go to the comments section for his perspective.

Introduction

It’s been a while since my last technical book review. I’ve decided to have a go at Domain-Driven Rails as this subject is one of the things that I’ve been particularly interested in for quite a while now.

For those who don’t know me, I’m the creator of Karafka framework, a tool that can and is used for simplifying Apache Kafka based Ruby applications development. It is used by multiple teams for implementing the Domain-Driven approach (mainly, for the Domain events / Event sourcing parts of DDD)  within their Ruby and Rails applications.

I’ve been using the DDD approach for both: new and existing systems for over 3 years and I can definitely say, that if you want to build complex systems, which can be maintained and developed, this is the way to go.

But is Domain-Driven Rails the best book to read before tossing yourself into deep end? Unfortunately, I’m not quite sure.

Aim of this review

The aim of this review is to help you decide whether or not this book is a good choice for you and is it worth spending $49 or more.

Who is the book target reader?

I would say, that this book is definitely not for the readers who:

  • just finished some Ruby on Rails tutorials,
  • recently mastered basics of CRUD and scaffolding,
  • didn’t encounter any of the problems that this book aims to solve,
  • your biggest applications contain +/-20 models at most and have only simple business logic,
  • you’ve never used Sidekiq or any other background processing tools.

Of course it doesn’t mean that it won’t be useful for you at all. It’s more about giving the problems and challenges described in this title a deeper, personalized context without which it will be just a next theoretical (from the usage perspective) position in your library.

It’s also not a glossary type of book, which you can open, find a solution and close it again. DDD is a huge subject and cherry-picking of some parts as remedies may not be the best idea.

DD Rails can be a good choice if:

  • you’ve spent 2+ years with Rails built systems,
  • you just need a really quick introduction into the subject, without theoretical base or anything else,
  • you’ve just started a new job in a company where they use DDD and you want to get a grasp of it but don’t have too much time.

In such cases, go for it, but before you do, please read the rest of the review.

Advanced programmer reviewing book for mids? Does it even make sense?

You might say, that I’m not the best guy to review this book. You might say, that I’ve been using the approach and solutions presented in it for a long time, so my expectations may not exactly be met by this book. And I think just the opposite, because I remember myself reading Implementing Domain-Driven Design and trying to adapt knowledge found there into the systems built with Ruby and Rails. I remember times, when there were no decent libraries nor articles for Ruby programmers on that matter. Heck, that was one of the reasons Karafka was born!

The review

Below I present some of the things that I liked/disliked in particular. If you are interested in a TL;DR version, please go to the summary section.

What did I like about this book

Undoubtedly, there are some good aspects of this book. Here are a few that I appreciated the most.

The idea

I’m a big evangelist of DDD and its components. I really appreciate the time and the effort Robert & the Arkency team invested to make DDD more accessible and easier for Ruby and Rails developers.

References

If you strive for more information on a given matter, at the end of each chapter you will find tons of references and links for further exploration.

Sagas

Sagas are one of the things I find hard to explain to people*. We are accustomed too much to the HTTP way of thinking.

Sagas are meant to be asynchronous and they need to be designed that way.

Because saga will be asynchronously processing domain events from many bounded contexts, which can happen at any time, there is potential for race conditions

Luckily the guys behind this publication didn’t forget about it. They embrace it and present a few ways of dealing with this kind of situations.

They also cover the at-least-once delivery case, idempotent sagas and events duplication:

If the message queue that you use gives at-least-once delivery guarantees your sagas should be idempotent and handle duplicated events.

Too bad though, that they didn’t mention the exactly-once Kafka’s feature.

* Especially when you don’t call them process managers

Code examples and business context

The book is rich with examples. Many of them are from finance-related systems that Arkency develops and maintains, which adds an extra value to it. Most of them are clear and stripped out of useless parts that aren’t relevant to the discussed subject.

Things I didn’t like about that book

The good part is behind us. Now it’s time for things that in my opinion made this book much worse that it could be.

Rails or Ruby, Ruby or Rails?

The first page of the prologue and already a funny contradiction:

One topic that was relevant here is how to grow as a Rails developer.

and a few sentences later we get this:

If you focus too much on the technology dimension,
you will become too reliant on “The Framework Way”

Shouldn’t this approach be applied to writing books as well? If so, then maybe the book should be called DD for Ruby. I’m not saying, that this book is tightly bound to the Rails framework, but it would be good not to become too reliant on “The Framework Tooling”. If you get knowledge from a framework perspective*, it will always be distorted. More or less, but it will be and you need to remember this while reading the book.

* This applies as well for a programming language layer

Read a book to understand a book

As I’ve mentioned, there is almost no pure theoretical knowledge presented in DDR. This is an advantage and disadvantage at the same time. I couldn’t help feeling like something was missing and that I could understand everything because I already understood the matter. It means, that if you’re new to the subject, you might find yourself googling for not yet explained acronyms or terminology that

we will discuss in a moment.

The lack of glossary at the end is also a bummer. Especially since (as mentioned), there are times when the explanation for a term is not during its first  occurrence.

Oversize images?

I’ve counted a number of pages where there would be less than 50% of text and/or some oversize images. There are 23 pages like that. It means, that more than 12% of the book is useless.

Note: this point might no longer be valid. Please visit the Updates section of this review.

It took me 1 day to read it. For me, +/- 120 pages of content is not enough to call it a book.

A limited perspective of DDD the Arkency way

If you decide to buy this book and use concepts and code snippets presented there, keep in mind, that you might become Arkency-dependent.

They present their libraries, their approach and their solutions. There would not be anything bad about that, if they explored other solutions and tools available.

They also seem to be really agnostic about their technological stack. They recommend using same database for storing events:

(…) I strongly recommend at the beginning of your journey into the world of asynchronicity, eventual consistency and async handlers: a message bus with a queue in the same SQL database that you for your app. It really simplifies error handling on many levels.

But what they forgot to mention is that:

  • You might not be able to deserialize YAML events in other technologies that easily.
  • You cut yourself from a set of flexible tools that will allow you to easily expand and replace Ruby based components with those written with Go, Elixir or any other technology you find suitable for doing certain jobs.
  • You might become narrow-minded about potential solutions and tools you can add to your stack (because they will have to play along with the Arkency gems).
  • Going for an all-in-one DB solution, tightly integrating all aspects of your software with it, makes it much harder to replace a message bus like that with a more generic one.

Note: I wouldn’t mind that at all, if the books description stated, that most of the examples and code snippets are related to their gems and libraries.

Outdated before published

Long time ago, when Rails weren’t that stable in terms of concepts and internal API, I used to say, that a released book about Ruby on Rails is already an outdated book. This sentence was true especially for printed versions that could not be easily updated.

But I’ve never seen a book that would use abandoned and outdated gems. I was quite surprised to notice, that there are almost 2 pages about Virtus, a library that was abandoned by its creator Solnic (great explanation on the reasons can be found here). Everyone that used to use it, knew from 2015, that it’s going to be totally dropped in favor of the dry-rb ecosystem. But hey, probably authors explore those in the book as well. But did they?

I am not super familiar with the dry-rb ecosystem but I am pretty sure that dry-types, dry-struct and dry-validations provide a nice experience in building command objects as well.

Published but incomplete

Let the image speak for itself:

I know, that the Lean approach is getting more and more popular, but should it be applied to books? I don’t think so. 

Note: This has already been explained in the Updates section and in the comments section of this review.

Eventide as a bonus

When I buy a programming book, I expect it to be written by people that deal with the subject on a daily basis. What did I get as a bonus in DD Rails? 11 page long description of Robert’s first Eventide encounter. Really? I don’t mind him playing with that set of libraries. I would even appreciate his point of  view on them in a blog post, but spending 11 pages out of a book just to describe his first impressions on it?

Summary / TL;DR

Note: Please read the Updates section as not all of this TL;DR might be up to date.

Believe me, I’m not happy to say it, but I must. This book is mediocre at best. Its content could fit into 10-15 blog posts. Then I would say, that it is a really good starting point for people not familiar with DDD. But to sell it as a book?

I’m disappointed with what I’ve read. I know, that Arkency can do way better. I know, that they have 100% guarantee policy but even then, it doesn’t justify selling a product that is:

  • incomplete (refers to the beta version, please read the Updates section),
  • outdated,
  • without proofreading,
  • too short,
  • badly structured in terms of layout,
  • presenting only a single set of libraries and a single perspective on the subject.

There’s a huge difference between expectations against blog posts and books and I think, sometimes it’s worth reminding some people about it.

Olderposts

Copyright © 2017 Running with Ruby

Theme by Anders NorenUp ↑