Tag: kafka

Kafka topics as code – declarative Kafka topics management in Ruby

Kafka topics are a fundamental concept in Apache Kafka. Topics are logical names or labels representing a stream of messages that Kafka clients can produce and consume.

What makes them interesting is the variety of settings that can be applied to them. These settings, amongst others include:

  • Partition count: The number of partitions that a topic should be split into.
  • Replication factor: The number of replicas that should be maintained for each partition.
  • Retention period: The time that messages should be retained in the topic.
  • Minimum and maximum in-sync replicas: The minimum number of replicas that must be in sync before a producer can receive acknowledgment for a write operation.
  • Cleanup policy: The policy used for deleting old messages from the topic.

When looking from a management perspective, topics are similar to database tables. They have names, a set of settings that apply to them, and their constraints. And on top of all of that, they need to be managed.

Declarative topics management

The management approach that I like and support in Karafka is called Declarative Topics Management. It allows for automatic topic creation and configuration based on predefined rules. It is a way to automate the process of managing Kafka topics by defining the desired topic properties and configurations in a declarative manner rather than manually creating and managing topics.

With Declarative Topics Management, you can define a set of rules that specify how topics should be created and configured. These rules can be based on various factors, such as the topic's name, number of partitions, replication factor, retention policy, and more.

Example of a declarative repartitioning using karafka topics migrate command

Keeping Kafka topics configuration as code has several benefits:

  • Version Control: By keeping the topic settings as code, you can track changes over time and easily understand historical changes related to the topics. This is particularly important in a production environment where changes need to be carefully managed.

  • Reproducibility: When you define Kafka topics settings as code, you can easily recreate the same topic with the same settings in multiple environments. This ensures that your development, staging, and production environments are consistent, which can help prevent unexpected issues and bugs.

  • Automation: If you use code to define Kafka topics settings, you can automate the process of creating and updating topics. This can save time and reduce the risk of human error.

  • Collaboration: When you keep Kafka topics settings as code, you can collaborate with other developers on the configuration. You can use tools like Git to manage changes and merge different configurations.

  • Documentation: Code is self-documenting, meaning anyone can look at the configuration and understand what is happening. This can make it easier for new team members to get up to speed and help troubleshoot issues.

In-app topics management

There are many ways to manage declaratively Kafka topics. For complex systems, you may want to look into tools like topicctl.

Often, however, your setup won't be overcomplicated. The primary thing that needs to happen is to ensure that all of your environments and developers use topics with the same configuration.

Partition count is a simple example where a config difference can impact the business logic and create hard-to-track issues. By default, topics created automatically by Kafka always have one partition. Assume a developer is working on something that requires strong ordering. If his development and test environments operate on only one partition, problems emerging from invalid partition key selection may only occur once the code hits production. Those types of race conditions can be both critical and hard to detect.

To mitigate risks of that nature, Karafka ships with a Declarative Topics feature. This feature lets you describe your topics' configuration in your routing, ensuring that the set of settings is the same across all the managed environments.

Defining topic configuration

All the configuration for a given topic needs to be defined using the topic scope #config method:

class KarafkaApp < Karafka::App
  routes.draw do
    topic :events do
      config(
        partitions: 6,
        replication_factor: Rails.env.production? ? 3 : 1,
        'retention.ms': 86_400_000 # 1 day in ms,
        'cleanup.policy': 'delete'
      )

      consumer EventsConsumer
    end
  end
end

Such a configuration can be then applied by running the following command: bundle exec karafka topics migrate. This command will create the topic if missing or repartition in case there are not enough partitions.

Karafka ships with following topics management related commands:

  • karafka topics create - creates topics with appropriate settings.
  • karafka topics delete - deletes all the topics defined in the routes.
  • karafka topics repartition - adds additional partitions to topics with fewer partitions than expected.
  • karafka topics reset - deletes and re-creates all the topics.
  • karafka topics migrate - creates missing topics and repartitions existing to match expected partitions count.

The below example illustrates the usage of the migrate command to align the number of partitions and to add one additional topic:

Limitations

This API has few limitations about which you can read here. There are two primary things you need to keep in mind:

  • Topics management API does not provide any means of concurrency locking when CLI commands are being executed. This means it is up to you to ensure that two topic CLI commands are not running in parallel during the deployments.
  • Karafka currently does not update settings different than the partition count on existing topics. This feature is under development.

Summary

Karafka declarative topics management API is an excellent solution for low and medium-complexity systems to ensure consistency of their topics across multiple environments, and that is available out-of-the-box with the framework itself.

Getting started with Kafka and Karafka

If you want to get started with Karafka as fast as possible, then the best idea is to visit our Getting started guides and the example Rails app repository.

Karafka Web UI – Your Ruby and Rails out-of-the-box Kafka UI

I'm thrilled to announce the new and shiny addition to the Karafka ecosystem: Karafka Web.

For those who wonder what Karafka is, Karafka is a Ruby and Rails multi-threaded efficient Kafka processing framework.

Karafka has always been a convenient framework, and I've abstracted or hidden many complexities related to working with Apache Kafka. However, the ecosystem needed one essential thing: a Web UI.

Until now, you would have to rely on external tooling to get visibility into your Karafka operations. While this is not problematic for big businesses, solid observability is difficult for anyone just starting their adventure with Karafka and Kafka.

Today I have the pleasure of presenting an effect of the last six months of my OSS work: Karafka Web. The Web UI provides a convenient way for developers to monitor and manage their Karafka-based applications without using the command line or third-party software. It does not require any additional database beyond Kafka itself.

"Hey, this looks like Sidekiq" you may say. And this is true! Mike was kind enough to allow me to utilize his well-curated and battle-tested dashboard design; honestly, I cannot thank him enough for that.

Features and capabilities

I've been working with Apache Kafka for over eight years, and I always wished we had a tool that could be easily mounted and used as a Rack application that would provide process-centric visibility. There are many excellent Web UIs for Apache Kafka, though most of them focus on Kafka. Karafka Web aims to provide another layer of visibility that is Karafka consumers-centric, allowing you to understand and debug your consumption operations.

# Mounting is simple as it can be
require 'karafka/web'

Rails.application.routes.draw do
  # Your other routes...

  mount Karafka::Web::App, at: '/karafka'
end

Below you can find the presentation of features I consider the most notable.

Note: You can find the whole list of features and capabilities described here.

Consumers monitoring and insights

Each Karafka consumer periodically reports its status and metrics to a dedicated Kafka topic. This data is then used to compute aggregated metrics and provide visibility into the current operations of each consuming process.

Consumers monitoring gives you a general overview and granular insights into each of the running processes. Ever wondered whether your processes are IO or CPU bound at a given time? Or how loaded are your processes? Now you can check it out with one click!

Data Explorer

Data explorer allows users to view and explore the data produced to Kafka. It understands the Karafka routing table and can deserialize data before it being displayed. It allows for quick investigation of both payload and header information.

Error tracking

Karafka consumers are multi-threaded. The consumption process happens independently from data polling. There is a lot of synchronization, and not all the errors propagate to the consumer threads. Karafka records all the errors, including the non-user-related ones, and presents them in the errors view.

Getting Started

If you want to get started with Karafka and test the Web-UI as fast as possible, then the best idea is to visit our Web UI getting started guides and the example Rails app repository.

The example Rails repository already contains the Web UI and detailed instructions on how to run it.

Support

Building and maintaining a complex OSS framework takes a lot of resources. That's why I also sell Karafka Pro subscriptions. It includes a commercial-friendly license, priority support, architecture consultations, enhanced Web UI and high throughput data processing-related features (virtual partitions, long-running jobs, and more).

Help me provide high-quality open-source software. If your business rely on Karafka, please consider supporting me. See the Karafka homepage for more details.

Future plans

My primary Web UI-related efforts revolve around providing trend graphs for better health assessment and visibility to diagnose potential lagging and clogging issues quickly.

TL;DR

No UI: bad.

Out-of-the-box OSS Karafka Web UI: great.

No third party dependencies, minimal supply chain fingerprint, works out-of-the-box.

Useful links

Copyright © 2024 Closer to Code

Theme by Anders NorenUp ↑