Tag: waterdrop

Karafka Web UI – Your Ruby and Rails out-of-the-box Kafka UI

I'm thrilled to announce the new and shiny addition to the Karafka ecosystem: Karafka Web.

For those who wonder what Karafka is, Karafka is a Ruby and Rails multi-threaded efficient Kafka processing framework.

Karafka has always been a convenient framework, and I've abstracted or hidden many complexities related to working with Apache Kafka. However, the ecosystem needed one essential thing: a Web UI.

Until now, you would have to rely on external tooling to get visibility into your Karafka operations. While this is not problematic for big businesses, solid observability is difficult for anyone just starting their adventure with Karafka and Kafka.

Today I have the pleasure of presenting an effect of the last six months of my OSS work: Karafka Web. The Web UI provides a convenient way for developers to monitor and manage their Karafka-based applications without using the command line or third-party software. It does not require any additional database beyond Kafka itself.

"Hey, this looks like Sidekiq" you may say. And this is true! Mike was kind enough to allow me to utilize his well-curated and battle-tested dashboard design; honestly, I cannot thank him enough for that.

Features and capabilities

I've been working with Apache Kafka for over eight years, and I always wished we had a tool that could be easily mounted and used as a Rack application that would provide process-centric visibility. There are many excellent Web UIs for Apache Kafka, though most of them focus on Kafka. Karafka Web aims to provide another layer of visibility that is Karafka consumers-centric, allowing you to understand and debug your consumption operations.

# Mounting is simple as it can be
require 'karafka/web'

Rails.application.routes.draw do
  # Your other routes...

  mount Karafka::Web::App, at: '/karafka'
end

Below you can find the presentation of features I consider the most notable.

Note: You can find the whole list of features and capabilities described here.

Consumers monitoring and insights

Each Karafka consumer periodically reports its status and metrics to a dedicated Kafka topic. This data is then used to compute aggregated metrics and provide visibility into the current operations of each consuming process.

Consumers monitoring gives you a general overview and granular insights into each of the running processes. Ever wondered whether your processes are IO or CPU bound at a given time? Or how loaded are your processes? Now you can check it out with one click!

Data Explorer

Data explorer allows users to view and explore the data produced to Kafka. It understands the Karafka routing table and can deserialize data before it being displayed. It allows for quick investigation of both payload and header information.

Error tracking

Karafka consumers are multi-threaded. The consumption process happens independently from data polling. There is a lot of synchronization, and not all the errors propagate to the consumer threads. Karafka records all the errors, including the non-user-related ones, and presents them in the errors view.

Getting Started

If you want to get started with Karafka and test the Web-UI as fast as possible, then the best idea is to visit our Web UI getting started guides and the example Rails app repository.

The example Rails repository already contains the Web UI and detailed instructions on how to run it.

Support

Building and maintaining a complex OSS framework takes a lot of resources. That's why I also sell Karafka Pro subscriptions. It includes a commercial-friendly license, priority support, architecture consultations, enhanced Web UI and high throughput data processing-related features (virtual partitions, long-running jobs, and more).

Help me provide high-quality open-source software. If your business rely on Karafka, please consider supporting me. See the Karafka homepage for more details.

Future plans

My primary Web UI-related efforts revolve around providing trend graphs for better health assessment and visibility to diagnose potential lagging and clogging issues quickly.

TL;DR

No UI: bad.

Out-of-the-box OSS Karafka Web UI: great.

No third party dependencies, minimal supply chain fingerprint, works out-of-the-box.

Useful links

Karafka (Ruby + Kafka) framework 1.1.0 Release Notes

Note: These release notes cover only the major changes. To learn about various bug fixes and changes, please refer to the change logs or check out the list of commits in the main Karafka repository on GitHub.

Time passes by, Kafka is already 1.0 and Karafka is already 1.1.

Code quality

I will start from the same thing as with 1.0. We're constantly working on having a better and easier code base. Apart from adding several new monitoring tools to our code quality stack, we were able to maintain a pretty decent offenses distribution and trends.

It's also worth noting, that our reaction time towards gem updates and other upgrades have significantly improved, which means that we're almost always up to date with all the dependencies.

Features

There are several new features included in this release. Most of them focus on advanced users, that needed a better control over processing flow. However, this does not mean, that regular "consumers" won't benefit from them. Features from this release give you space to expand your applications beyond simple consumption and allow you to process more and faster.

Manual offset management

Most of the time you will be happy with the automatic offset management, however there are some cases in which you might be interested in taking control over this process. It can be helpful i.a.:

  • In memory DDD sagas realization,
  • Buffering,
  • Simulating transactions.

In a real shortcut, this is how you can use it:

Disable automatic offset management either on the app or the consumer group level:

class App < Karafka::App
  consumer_groups.draw do
    consumer_group :events do
      automatically_mark_as_consumed false

      topic :user_events do
        controller EventsController
      end
    end
  end
end

and just commit your offsets manually:

def consume
  # Do something with messages
  EventStore.store(params_batch.parsed)
  # And now mark last message as consumed,
  # so we won't consume any of already processed messages again
  consumer.mark_as_consumed params_batch.to_a.last
end

Note: You can read more about this feature in the Manual offset management (checkpointing) Wiki section.

WaterDrop 1.0 with sync and async support

WaterDrop is a standalone messages producer that is integrated with Karafka out of the box.

We've recently redesigned both its internals and the API, to make it better, less memory consuming, easier to use and more bullet-proof.

Karafka 1.1 comes with full WaterDrop 1.0 support, including both synchronous and asynchronous producers. It also integrates automatically with it, populating all the options related to Kafka that were set during the Karafka framework configuration.

In case you want to change WaterDrop configuration settings, you can do this after you setup and boot Karafka framework in the karafka.rb file:

class App < Karafka::App
  setup do |config|
    config.kafka.seed_brokers = ::Settings.kafka.seed_brokers
    config.kafka.offset_commit_threshold = 30
    config.client_id = ::Settings.name
  end

  consumer_groups.draw do
    # consumer groups definitions go here
  end
end

App.boot!

# Overwrite default delivery setting and don't send in the test env
WaterDrop.setup do |water_config|
  water_config.deliver = !Karafka.env.test?
end

Responders are still the recommended way to generate Kafka messages, however if you want, you can use WaterDrop directly:

# For sync
WaterDrop::SyncProducer.call('message', topic: 'my-topic')
# or for async
WaterDrop::AsyncProducer.call('message', topic: 'my-topic')

Async support per topic for responders

As mentioned above, WaterDrop now supports both synchronous and asynchronous way of producing messages. If wouldn't make any sense, if the same would not be available for responders. From now on, you can decide on a delivery method per topic with which you decide to work:

class ExampleResponder < ApplicationResponder
  topic :regular_topic
  topic :async_topic, async: true

  def respond(user, profile)
    respond_to :regular_topic, user
    # This will be sent async
    respond_to :async_topic, user
  end
end

New set of callbacks for better flow control

Callbacks can be used to trigger some actions on certain moments of Karafka messages receiving flow. You can use them for additional actions that need to take place at certain moments. They are not available by default, as we don't want to provide functionalities that are not required by users by default.

In order to be able to use them, you need to include Karafka::Controllers::Callbacks module into your controller class:

class ExamplesController < Karafka::BaseController
  include Karafka::Controllers::Callbacks

  after_fetched do
    # Some logic here
  end

  def consume
    # Some logic here
  end
end

Currently there are four callbacks available:

  • after_fetched - executed right after we fetch messages from Kafka but before the main logic kicks in.
  • before_stop - executed before the shutdown process kicks in. Really useful if you use manual offset management.
  • after_poll - executed after each attempt to fetch messages from Kafka (even when there is no data).
  • before_poll - executed before each attempt to fetch messages from Kafka (even when there is no data).

Please visit the Callbacks Wiki section for more details.

Incompatibilities and breaking changes

after_received callback renamed to after_fetched

IF you use the after_received callback, you will have to do two things to make it work with 1.1:

  • Rename it from after_received to after_fetched
  • include Karafka::Controllers::Callbacks module inside of your controller
class ExamplesController < Karafka::BaseController
  include Karafka::Controllers::Callbacks

  after_fetched do
    # Some logic here
  end
end

connection_pool config options are no longer needed

WaterDrop 1.0 uses in-thread consumer pool, so connection pool is no longer required. You need to remove all connection_pool related settings.

Celluloid config options are no longer needed

Karafka no longer uses Celluloid, so all the Celluloid options are no longer needed.

#perform is now renamed to #consume

#perform has been renamed to #consume. Please update all your controllers to match this change.

class ExamplesController < Karafka::BaseController
  include Karafka::Controllers::Callbacks

  # Old name
  def perform
    # Some logic here
  end

  # New name
  def consume
    # Some logic here
  end
end

Renamed batch_consuming option to batch_fetching and batch_processing to batch_consuming

We're constantly trying to unify naming conventions. Due to some misunderstanding on what is consuming and what is processing, we've decided to rename them. So just to clarify:

  • fetching is a process of receiving messages from Kafka cluster (no user business logic involved)
  • consuming is a process of applying your business logic na processing the data

So now:

  • if you want to fetch messages from Kafka in batches, you need to have batch_fetching set to true
  • if you also want to work with messages in batches, you need to have batch_consuming set to true

Other improvements

Wiki updates

Apart from code changes, we also updated Wiki pages accordingly and added the FAQ section.

Celluloid dropped in favor of native thread pool management

Mike has a great explanation related to Sidekiq on that one. Our reasons were quite similar:

  • Memory consumption,
  • Way more dependencies needed,
  • Abstraction overhead.

Also, with a bit of Ruby-Kafka patching, we don't need an extra layer to handle shutting down and other edge cases.

Unused dependencies cleanup

We're constantly working on lowering the memory footprint of Karafka. It turned out, that we would load some of the parts of ActiveSupport that weren't required. This is now cleaned up. Also we're planning to completely drop ActiveSupport requirement as we're not the biggest fans of this gem.

Getting started with Karafka

If you want to get started with Kafka and Karafka as fast as possible, then the best idea is to just clone our example repository:

git clone https://github.com/karafka/karafka-example-app ./example_app

then, just bundle install all the dependencies:

cd ./example_app
bundle install

and follow the instructions from the example app Wiki.

Copyright © 2024 Closer to Code

Theme by Anders NorenUp ↑