Karafka 2.5 and Web UI 0.11: Next-Gen Consumer Control and Operational Excellence

Introduction

Imagine pausing a problematic partition, skipping a corrupted message, and resuming processing - all within 60 seconds, without deployments or restarts. This is now a reality with Karafka 2.5 and Web UI 0.11.

This release fundamentally changes how you operate Kafka applications. It introduces live consumer management, which transforms incident response from a deployment-heavy process into direct, real-time control.

I'm excited to share these capabilities with the Karafka community, whose feedback and support have driven these innovations forward.

Sponsorship and Community Support

The progress of the Karafka ecosystem has been significantly powered by a collaborative effort, a blend of community contributions, and financial sponsorships. I want to extend my deepest gratitude to all our supporters. Your contributions, whether through code or funding, make this project possible. Thank you for your continued trust and commitment to making Karafka a success. It would not be what it is without you!

Notable Enhancements and New Features

Below you can find detailed coverage of the major enhancements, from live consumer management capabilities to advanced performance optimizations that can improve resource utilization by up to 50%. For a complete list of all changes, improvements, and technical details, please visit the changelog page.

Live Consumer Management: A New Operational Model

This update introduces live consumer management capabilities that change how you operate Karafka applications. You can now pause partitions, adjust offsets, manage topics, and control consumer processes in real time through the web interface.

Note: While the pause and resume functionality provides immediate operational control, these controls are session-based and will reset during consumer rebalances caused by group membership changes or restarts. I am developing persistent pause capabilities that survive rebalances, though this feature is still several development cycles away. The current implementation remains highly valuable for incident response, temporary load management, and real-time debugging where quick action is needed within typical rebalance intervals.

Why This Matters

Traditional Kafka operations often require:

  • Deploying code changes to skip problematic messages
  • Restarting entire applications to reset consumer positions
  • Manual command-line interventions during incidents
  • Complex deployment coordination

This release provides an alternative approach. You now have more control over your Karafka infrastructure, which can significantly reduce incident response times.

Consider these operational scenarios:

  • Incident Response: A corrupted message is causing consumer crashes. You can pause the affected partition, adjust the offset to skip the problematic message, and resume processing - typically completed in under a minute.

  • Data Replay: After fixing a bug in your processing logic, you need to reprocess recent messages. You can navigate to a specific timestamp in the Web UI and adjust the consumer position directly.

  • Load Management: Your downstream database is experiencing a high load. You can selectively pause non-critical partitions to reduce load while keeping essential data flowing.

  • Production Debugging: A consumer appears stuck. You can trace the running process to see what threads are doing and identify bottlenecks before taking action.

This level of operational control transforms Karafka from a "deploy and monitor" system into a directly manageable platform where you maintain control during challenging situations.

Partition-Level Control

The Web UI now provides granular partition management with surgical precision over message processing:

  • Real-Time Pause/Resume: This capability addresses the rigidity of traditional consumer group management. You can temporarily halt the processing of specific data streams during maintenance, throttle consumption from high-volume partitions, or coordinate processing across multiple systems.

  • Live Offset Management: The ability to adjust consumer positions in real-time eliminates the traditional cycle of "stop consumer → calculate offsets → update code → deploy → restart" that could take time.

Complete Topic Lifecycle Management

Pro Web UI 0.11 introduces comprehensive topic administration capabilities that transform how you can manage your Kafka infrastructure. The interface now supports complete topic creation with custom partition counts and replication factors, while topic deletion includes impact assessment and confirmation workflows to prevent accidental removal of critical topics.

The live configuration management system lets you view and modify all topic settings, including retention policies, cleanup strategies, and compression settings. Dynamic partition scaling enables you to increase partition counts to scale throughput.

This approach unifies Kafka administration in a single interface, eliminating the need to context-switch between multiple tools and command-line interfaces.

UI Customization and Branding

When managing multiple Karafka environments, visual distinction becomes critical for preventing costly mistakes. Web UI 0.11 introduces customization capabilities that allow you to brand different environments distinctively - think red borders for production, blue gradients for staging, or custom logos for other teams. Beyond safety, these features enable organizations to seamlessly integrate Karafka's Web UI into their existing design systems and operational workflows.

Karafka::Web.setup do |config|
  # Custom CSS for environment-specific styling
  config.ui.custom_css = '.dashboard { background: linear-gradient(45deg, #1e3c72, #2a5298); }'

  # Custom JavaScript for enhanced functionality
  config.ui.custom_js = 'document.addEventListener("DOMContentLoaded", () => { 
    console.log("Production environment detected"); 
  });'
end

The UI automatically adds controller and action-specific CSS classes for targeted styling:

/* Style only the dashboard */
body.controller-dashboard {
  background-color: #f8f9fa;
}

/* Highlight error pages */
body.controller-errors {
  border-top: 5px solid #dc3545;
}

Enhanced OSS Monitoring Capabilities

Web UI 0.11 promotes two monitoring features from Pro to the open source version: consumer lag statistics charts and consumer RSS memory statistics charts. These visual monitoring tools now provide all users with essential insights into consumer performance and resource utilization.

This change reflects my core principle: the more successful Karafka Pro becomes, the more I can give back to the OSS version. Your Pro subscriptions directly fund the research and development that benefits the entire ecosystem.

Open-source software drives innovation, and I'm committed to contributing meaningfully. By making advanced monitoring capabilities freely available, I ensure that teams of all sizes can access the tools needed to build robust Kafka applications. Pro users get cutting-edge features and support, while the broader community gains battle-tested tools for their production environments.

Performance and Reliability Improvements

While Web UI 0.11 delivers management capabilities, Karafka 2.5 focuses equally on performance optimization and advanced processing strategies. This version includes throughput improvements, enhanced error-handling mechanisms, and reliability enhancements.

Balanced Virtual Partitions Distribution

The new balanced distribution strategy for Virtual Partitions is a significant improvement, delivering up to 50% better resource utilization in high-throughput scenarios.

The Challenge:

Until now, Virtual Partitions have used consistent distribution, where messages with the same partitioner result go to the same virtual partition consumer. While predictable, this led to resource underutilization when certain keys contain significantly more messages than others, leaving worker threads idle while others were overloaded.

The Solution:

routes.draw do
  topic :orders_states do
    consumer OrdersStatesConsumer

    virtual_partitions(
      partitioner: ->(message) { message.headers['order_id'] },
      # New balanced distribution for optimal resource utilization
      distribution: :balanced
    )
  end
end

The balanced strategy dynamically distributes workloads by:

  • Grouping messages by partition key
  • Sorting key groups by size (largest first)
  • Assigning each group to the worker with the least current workload
  • Preserving message order within each key group

When to Use Each Strategy:

Use :consistent when:

  • Processing requires stable assignment across batches
  • Implementing window-based aggregations spanning multiple polls
  • Keys have relatively similar message counts
  • Predictable routing is more important than utilization

Use :balanced when:

  • Processing is stateless, or state is managed externally
  • Maximizing worker thread utilization is a priority
  • Message keys have highly variable message counts
  • Optimizing throughput with uneven workloads

The performance gains are most significant with IO-bound processing, highly variable key distributions, and when keys outnumber available worker threads.

Advanced Error Handling: Dynamic DLQ Strategies

Karafka 2.5 Pro introduces context-aware DLQ strategies with multiple target topics:

class DynamicDlqStrategy
  def call(errors_tracker, attempt)
    if errors_tracker.last.is_a?(DatabaseError)
      [:dispatch, 'dlq_database_errors']
    elsif errors_tracker.last.is_a?(ValidationError)
      [:dispatch, 'dlq_validation_errors']
    elsif attempt > 5
      [:dispatch, 'dlq_persistent_failures']
    else
      [:retry]
    end
  end
end

class KarafkaApp < Karafka::App
  routes.draw do
    topic :orders_states do
      consumer OrdersStatesConsumer

      dead_letter_queue(
        topic: :strategy,
        strategy: DynamicDlqStrategy.new
      )
    end
  end
end

This enables:

  • Error-type-specific handling pipelines
  • Escalation strategies based on retry attempts
  • Separation of transient vs permanent failures
  • Specialized recovery workflows

Enhanced Error Tracking

The errors tracker now includes a distributed correlation trace_id that gets added to both DLQ dispatched messages and errors reported by the Web UI, making it easier to track and correlate error occurrences with their DLQ dispatches.

Worker Thread Priority Control

Karafka 2.5 introduces configurable worker thread priority with intelligent defaults:

# Workers now run at priority -1 (50ms) by default for better system responsiveness
# This can be customized based on your system requirements
config.worker_thread_priority = -2

This prevents worker threads from monopolizing system resources, leading to more responsive overall system behavior under high load.

FIPS Compliance and Security

All internal cryptographic operations now use SHA256 instead of MD5, ensuring FIPS compliance with enterprise security requirements.

Coming Soon: Parallel Segments

Karafka 2.5 Pro also introduces Parallel Segments, a new feature for concurrent processing within the same partition when there are more processes than partitions. As we finalize the documentation, this capability will be covered in a dedicated blog post.

Migration Notes

Karafka 2.5 and its related components introduce a few minor breaking changes necessary to advance the ecosystem. No changes would disrupt routing or consumer group configuration. Detailed information and guidance can be found on the Karafka Upgrades 2.5 documentation page.

Migrating to Karafka 2.5 should be manageable. I have made every effort to ensure that breaking changes are justified and well-documented, minimizing potential disruptions.

Conclusion

Karafka 2.5 and Web UI 0.11 mark another step in the framework's evolution, continuing to address real-world operational needs and performance challenges in Kafka environments.

I thank the Karafka community for their ongoing feedback, contributions, and support. Your input drives the framework's continued development and improvements.


The complete changelog and upgrade instructions are available in the Karafka documentation.

From pidfd to Shimanami Kaido: My RubyKaigi 2025 Experience

Introduction

I just returned from RubyKaigi 2025, which ran from April 16th to 18th at the Ehime Prefectural Convention Hall in Matsuyama. If you're unfamiliar with it, RubyKaigi is the biggest Ruby conference, with over 1,500 people showing up this year. It's always a bit crazy (in the best way possible).

The conference had an orange theme. Ehime is famous for its oranges, and the organizers love bringing local flavor to the event.

What I love most about RubyKaigi is how it bridges the gap between the Japanese and Western Ruby worlds. Despite Ruby coming from Japan, these communities often feel separate in day-to-day work. This weird divide affects not just developers but also businesses. RubyKaigi is where these worlds collide, and you get to meet the people whose code you've used for years.

There's something special about grabbing a beer with someone whose gem you depend on or chatting with Japanese Rubyists you'd never usually interact with online. These face-to-face moments make RubyKaigi different from any other Ruby conference.

Pre-Conference (Day -1 & Day 0)

My journey to RubyKaigi was smoother than usual this time. I flew from Cracow, Poland, via Istanbul, which saved me the usual hassle of going to Warsaw first (those extra hours add up!). Instead of the typical route through Tokyo, I flew directly to Osaka - another nice time-saver. On my way to Matsuyama, I stopped in Okayama to check out the castle and the historical garden.

Day 0, for me, was all about the Andpad drinkup welcome party. I got to catch up with Hasumi Hitoshi, my good friend from Japan, along with many other Japanese Rubyists. One of the highlights was meeting the "Agentic Couple" - Justin Bowen and Rhiannon Payne, the creators of Active Agents gem. Little did I know then that I'd spend much more time with them later during some post-conference sightseeing and traveling.

These pre-conference meetups are where some of the best networking happens - everyone's fresh and excited for the days ahead.

The Conference Experience

Day 1 - Talks and Official Party

As the first English speaker in my room (rubykaigi-b), I started the day by discussing bringing pidfd to Ruby. It was exciting to present on this topic, which adds better process control functionality to Ruby - something I'm passionate about, given my work with Karafka.

You can find my presentation by clicking the image below or here:

Throughout the day, I attended as many talks as possible. However, people kept grabbing me for discussions (which I wasn't complaining about at all). One standout was Tagomoris's presentation on "State of Namespace." While I'm not exactly a fan of this feature (and he knows that ;) ), I greatly respect Tagomoris. We had a great follow-up discussion where I outlined my security concerns and the changes needed in Bundler and RubyGems. Ultimately, we both agreed that we must work collectively to ensure such changes bring only good to the community.

The day wrapped up with the official party at Shiroyama Park. The organizers had reserved the biggest park in Matsuyama just for us! The beers were excellent, and the atmosphere was exactly what you'd expect from RubyKaigi - relaxed, friendly, and full of interesting discussions. This is where the real magic happens - where Japanese and Western Rubyists mix over drinks and food, breaking down those invisible barriers that usually keep our communities apart.

Day 2 - ZJIT and More Connections

Day 2 was inspiring with Maxime Chevalier-Boisvert's talk about ZJIT - the successor to YJIT. If you're not familiar with Maxime's work, she's the one who won the Ruby Prize in 2021 for her work on optimizing Ruby's performance. Her new project aims to save and reuse compiled code between executions. I strongly believe that JIT for Ruby can do much more than it does now, bringing us to another level of performance.

The social aspect continued throughout the day with various company-sponsored events. What's unique about RubyKaigi is that these events aren't just corporate marketing exercises - but genuine opportunities for people to connect. The smaller scale of the sponsor presence this year (compared to having just a few big companies) made things more interesting, with more diverse interactions possible.

Day 3 - Ractor-local GC and Hacking Day

Day 3 brought another technical highlight with Koichi Sasada's talk on Ractor-local GC. Ractors are close to my heart because I want to use them in Karafka. While they are still limited, I feel we're finally making good progress. One of the biggest limitations has been cross-ractor GC. Koichi proposed a two-stage GC where part of GC work could run independently in Ractors while some GC runs would still be locking. He sees this as a practical middle ground that's technically easier to implement than fully independent GCs - his philosophy being that we should have something rather than nothing. This approach could make Ractors much more practical for real-world applications.

After the official talks, the day continued with a hacking session. This was amazing - so many Ruby core committers were in one room. People split into groups, and everyone worked on something in their interest. I spent my time analyzing the performance of new fixes - specifically improvements to Ractors. The results looked really great, which is the best news for me.

I need to investigate one interesting thing further: when parsing JSON in separate threads, it's about 10% faster than with the baseline, despite Ruby having GVL. That's an unexpected finding that may impact my future Karafka feature development.

The combination of talks and hacking sessions on Day 3 perfectly captured what makes RubyKaigi special - deep technical discussions followed by hands-on collaboration with some of the smartest people in the Ruby community.

Post-Conference Adventures

Days 4-5 - The Unofficial Adventures Begin

The conference officially ended on Day 3, but the real adventure had just begun. Various companies organized smaller events, and I showed up at one of them. On this "unofficial" day, I attended a drink-up sponsored by codeTakt that was super fun - it's always great to talk more Ruby in casual settings.

The next morning, I started Day 5 with a relaxing session at Dogo Onsen, one of Japan's oldest hot springs. Later, I did some sightseeing around Matsuyama and found a house that looked surprisingly similar to mine - just the Japanese version! I met up with Peter Zhu, and we went to visit some shrines. He collected goshuin (temple stamps) along the way. Later that day, I connected with other RubyKaigi attendees, including Marty Haught from RubyCentral, and we explored Matsuyama Castle together.

Day 6 - The Shimanami Kaido Adventure

One of the most memorable parts of my extended trip was the Shimanami Kaido bicycle tour with Marty and Justin, whom I'd met at the Day 0 Andpad event. The Shimanami Kaido is a famous cycling route that connects several islands via bridges and is located about an hour from Matsuyama.

We covered 60km in one day, which was a lot but totally worth it. Things got interesting when we left the main track to see some temples and head to a port. That's when we discovered there were no immediate direct ferries back to our starting point from where we ended up.

Google Maps saved the day by suggesting we hop to a small island called Oge (大下島). This tiny island has maybe 500 residents, mostly elderly people. We were the only visitors and spent about 45 minutes experiencing life on such a remote Japanese island. The whole detour was one of the craziest things we did. Still, it perfectly showed the spirit of unexpected adventure that makes these post-conference trips so memorable.

The entire cycling route was amazing. The bridges, the sea views, the small island communities - everything was incredible. I highly recommend it to anyone visiting the area after RubyKaigi.

Reflections and Why RubyKaigi Matters

Reflecting on my time in Matsuyama, what I notice most about RubyKaigi isn't just the great talks - those you can watch later on YouTube. The unique atmosphere and connections make this conference stand out from any other tech event I've attended.

RubyKaigi is great at bridging what I see as an unnecessarily isolated divide between the European-American Ruby scene and the Japanese one. This isolation creates real challenges for collaboration and, to some extent, leads to Japanese businesses operating separately from the global Ruby ecosystem. Many Japanese developers use RubyKaigi as a rare opportunity to practice their English and connect with the broader community despite their excellent technical writing skills.

I particularly appreciate how the conference keeps a real, technical-friendly vibe rather than feeling commercial. Unlike some conferences dominated by a few large corporate sponsors, RubyKaigi had many smaller sponsors, creating a more diverse and balanced environment. While I noticed fewer Western companies represented at the sponsor booths (Sentry was there, and maybe two others), this actually added to the conference's unique feel.

The fact that many attendees arrive days early and leave days later makes the event more than just a conference - it becomes something more meaningful. People treat their trip to Japan as part of their vacation and part of their professional development. This extended timeframe allows for deeper connections and more relaxed sightseeing. Matsuyama's calmer atmosphere compared to Tokyo, Osaka, or Sendai adds to this appeal - despite the tourist presence, the scale feels more manageable and peaceful.

From an organizational standpoint, RubyKaigi is in a class of its own. I've never attended another conference so well-organized and thoughtfully executed. It's an amazing event that I highly recommend to anyone wanting technical knowledge and meaningful connections with the global Ruby community. This conference never fails to remind me why I fell in love with Ruby and its community in the first place.

Summary and Final Thoughts

Looking back at my RubyKaigi 2025 experience, I realize how Japan continues to be remarkably generous with opportunities for unexpected connections. Each time I visit, I meet people I would never encounter otherwise - and often, they're not even from the IT world.

In Osaka, at a sake place recommended by fellow conference attendees, I had a memorable two-hour conversation with a retired man in his 70s. Despite his age, he was incredibly sharp and actively attended English school specifically to meet more people from around the world. These encounters show what makes Japan - particularly RubyKaigi - so special.

The conference itself remains the best Ruby event worldwide, not just for its technical content but for its unique ability to bridge communities. Excellent organization, meaningful international connections, and Japan's unique hospitality create an experience far beyond a typical tech conference. Whether cycling the Shimanami Kaido, exploring tiny islands, or simply sharing a beer with developers whose code you use daily, RubyKaigi offers something truly special.

I'm already looking forward to RubyKaigi 2026. If you've never been, start planning now - this conference is worth every mile traveled.

Copyright © 2025 Closer to Code

Theme by Anders NorenUp ↑