1. Kafka on Rails: Using Kafka with Ruby on Rails – Part 1 – Kafka basics and its advantages
  2. Kafka on Rails: Using Kafka with Ruby on Rails - Part 2 - Getting started with Ruby and Kafka

Kafka Docker local setup

Before we proceed with combining Kafka with Ruby, it would be good to have a workable local Kafka process. Kafka requires Zookeeper and to be honest, a local setup can be a bit tricky. The easiest way to do that is by running a docker container for that. Here's an example script that should be enough for the basic local work. It will spin up a single node cluster of Kafka that you can use out of the box:

git clone https://github.com/wurstmeister/kafka-docker.git
cd kafka-docker
git checkout 1.0.1
vim docker-compose-single-broker.yml
# Replace as followed:
# set to:
docker-compose -f docker-compose-single-broker.yml up

To check that it works, you can just telnet to it:

telnet 9092
Connected to
Escape character is '^]'.

Note: If you need anything fancy, you can find a more complex Dockerfile setup for running Kafka here.

Getting started with Karafka framework

Karafka is a framework used to simplify Apache Kafka based Ruby and Rails applications development. It provides a higher-level abstraction, that allows you to focus on your business logic development, instead of focusing on implementing lower level abstraction layers. It provides developers with a set of tools that are dedicated for building multi-topic applications similarly to how Rails applications are being built.

As README states:

  • You can integrate Karafka with any Ruby-based application.
  • Karafka does not require Sidekiq or any other third party software (apart from Kafka itself).
  • Karafka works with Ruby on Rails but it is a standalone framework that can work without it.
  • Karafka has a minimal set of dependencies, so adding it won't be a huge burden for your already existing applications.
  • It handles processing, using multiple threads, so it will utilize your CPU better (especially for IO-bound applications).

The way you should start with Kafka and Karafka heavily depends on your system state. I always recommend a different approach for tackling the already existing complex systems and for greenfield applications, especially those that don't use Rails at all.

It's quite common when using Kafka, to treat applications as parts of a bigger pipeline (similarly to Bash pipeline) and forward the processing results to other applications. Karafka provides two ways of dealing with that:

  • Using WaterDrop directly - as a messaging layer that can be easily introduced to any applications that are already running.
  • Via responders (recommended for a more complex, complete integration) (Removed starting from Karafka 2.0)

Brownfield system initial integration

Note:This introduction aims to get you going as fast as possible with sending messages. A broad description on decomposing an already existing Rails application will be provided in one of the upcoming posts in this series.

One of the easiest ways to get started with Kafka and Karafka in an already existing (and often complex) system is by introducing a simple messaging layer that will broadcast events to the Kafka cluster. This approach has several advantages:

  • You can get familiar with the stack without bigger changes to your system.
  • It's easier.
  • It does not require much configuration and setup.
  • You won't have to change your deployment process as messaging can happen from any Ruby process you run, like: Puma processing, Sidekiq process, Resque process, etc.

To do so, you need to install WaterDrop. It is a standalone Karafka component library for sending Kafka messages. Despite being one of the framework components, it can also act independently to allow an easier bootstrapping and usage from already running production systems. You can consider it to be an intermediate step in between not having Karafka and having it running on a full-scale.

In order to use it, you need to add this to your Gemfile:

gem 'waterdrop'

and run

bundle install

Once you're done, you also need to create a config/initializers/water_drop.rb configuration file that will contain at least a single Kafka seed broker address:

# WaterDrop 2.0
producer = WaterDrop::Producer.new

producer = WaterDrop::Producer.new do |config|
  config.deliver = true
  config.kafka = { 'bootstrap.servers': 'localhost:9092' }

# WaterDrop 1.4
WaterDrop.setup do |config|
  config.kafka.seed_brokers = %w[kafka://localhost:9092]

After that, you should be able to send messages. To check, that everything works as expected, just try to deliver a single message with a sync producer:

# WaterDrop 2.0
producer.produce_sync(topic: 'my-topic', payload: 'message')

# WaterDrop 1.4
WaterDrop::SyncProducer.call('message', topic: 'my-topic')

Note: It's a really good idea to disable a topic auto-creation for the Kafka production cluster. Typos happen to everyone. You can read more about Kafka brokers configuration options here.

Note: If you want to go full-scale for both producing and processing messages, just go to the Integrating with Ruby on Rails and other frameworks section of the Karafka Wiki and follow the setup instructions.

Fresh start with a greenfield system

When you don't need integration with your current stack or you already send messages and want to consume them from a separate application, you can start easily with a clean installation:

mkdir app_dir
cd app_dir
echo "source 'https://rubygems.org'" > Gemfile
echo "gem 'karafka'" >> Gemfile

bundle install
bundle exec karafka install

The karafka install command will create all the files and directories that are required to run Karafka server process. The most interesting file is the karafka.rb file that contains all the configuration details and will contain your routing details to match consumers with proper Kafka topics.

Summary - Getting started is easy!

This part of the series wasn't really long. Karafka is well written and adding it to the stack is not a big problem. And because Kafka messages are immutable, sending messages is a great way to start working with it.

One thing that I can suggest to you at the end of this article, is not to throw yourself in at the deep end by implementing producing and consuming at the same time (especially if you don't have experience with Kafka). Quite often, the initial concept and vision related to the processing flow may change after some modeling. Broadcasting without consumption gives you a really good playground to test your ideas without any risk.

Stay tuned :-)

Read more: