Tag: apache kafka

The 60-Second Wait: How I Spent Months Solving the Ruby’s Most Annoying Gem Installation Problem

July 17, 2025 / Maciej Mensfeld

Notice: While native extensions for rdkafka have been extensively tested and are no longer experimental, they may not work in all environments or configurations. If you find any issues with the precompiled extensions, please report them immediately and they will be resolved.

Every Ruby developer knows this excruciating feeling: you're setting up a project, running bundle install, and then... you wait. And wait. And wait some more as rdkafka compiles for what feels like eternity. Sixty to ninety seconds of pure frustration, staring at a seemingly frozen terminal that gives no indication of progress while your coffee is getting cold.

I've been there countless times. As the maintainer of the Karafka ecosystem, I've watched developers struggle with this for years. The rdkafka gem - essential for Apache Kafka integration in Ruby - was notorious for its painfully slow installation. Docker builds took forever. CI pipelines crawled. New developers gave up before they even started. Not to mention the countless compilation crashes that were nearly impossible to debug.

Something had to change.

The Moment of Truth

It was during a particularly frustrating debugging session that I realized the real scope of the problem. I was helping a developer who couldn't get rdkafka to install on his macOS dev machine. Build tools missing. Compilation failing. Dependencies conflicting. The usual nightmare.

As I walked him through the solution for the hundredth time, I did some quick math. The rdkafka gem gets downloaded over a million times per month. Each installation takes about 60 seconds to compile. That's 60 million seconds of CPU time every month - nearly two years of continuous processing power wasted on compilation alone.

But the real kicker? All those installations were essentially building the same thing over and over again. The same librdkafka library. The same OpenSSL. The same compression libraries. Millions of identical compilations happening across the world, burning through CPU cycles and developer patience.

That's when I decided to solve this once and for all.

Why Nobody Had Done This Before

You might wonder: if this was such an obvious problem, why hadn't anyone solved it already? The answer lies in what I call "compatibility hell."

Unlike many other Ruby gems that might need basic compilation, rdkafka is a complex beast. It wraps librdkafka, a sophisticated C library that depends on a web of other libraries:

OpenSSL for encryption
Cyrus SASL for authentication
MIT Kerberos for enterprise security
Multiple compression libraries (zlib, zstd, lz4, snappy)
System libraries that vary wildly across platforms

Every Linux distribution has slightly different versions of these libraries. Ubuntu uses one version of OpenSSL, CentOS uses another. Alpine Linux uses musl instead of glibc. macOS has its own quirks. Creating a single binary that works everywhere seemed impossible.

My previous attempts had failed, because they tried to link against system libraries dynamically. This works great... until you deploy to a system with different library versions. Then everything breaks spectacularly.

The Deep Dive Begins

I started by studying how other Ruby gems had tackled similar problems. The nokogiri gem had become the gold standard for this approach - they'd successfully shipped precompiled binaries that eliminated the notorious compilation headaches that had plagued XML processing in Ruby for years. Their success proved that it was possible.

Other ecosystems had figured this out years ago. Python has wheels. Go has static binaries. Rust has excellent cross-compilation. While Ruby has improved with precompiled gems for many platforms, the ecosystem still feels inconsistent - you never know if you'll get a precompiled gem or need to compile from source. The solution, I realized, was static linking. Instead of depending on system libraries, I would bundle everything into self-contained binaries.

Every dependency would be compiled from source and linked statically into the final library.

Sounds simple, right? It wasn't.

The First Breakthrough

My first success came with the Linux x86_64 GNU systems - your typical Ubuntu or CentOS server. After days of tweaking the compiler flags and build scripts, I had a working prototype. The binary was larger than the dynamically linked version, but it worked anywhere.

The installation time dropped from 60+ seconds to under 5 seconds!

But then I tried it on Alpine Linux. Complete failure. Alpine uses musl libc instead of glibc, and my carefully crafted build didn't work at all.

Platform-Specific Nightmares

Each platform brought its own specific challenges:

Alpine Linux (musl): Different system calls, different library conventions, different compiler behavior. I had to rebuild the entire toolchain with the musl-specific flags. The Cyrus SASL library was particularly troublesome - it kept trying to use glibc-specific functions that didn't exist in musl.
macOS ARM64: Apple Silicon Macs use a completely different architecture. The build system had to use different SDK paths, and handle Apple's unique library linking requirements. Plus, macOS has its own ideas about where libraries should live.
Security concerns: Precompiled binaries are inherently less trustworthy than source code. How do you prove that a binary contains exactly what it claims to contain? I implemented the SHA256 verification for every dependency, but that was just the beginning.

The Compilation Ballet

Building a single native extension became an intricate dance of dependencies. Each library had to be compiled in the correct order, with the right flags, targeting the right architecture. One mistake and the entire build would fail.

I developed a common build system that could handle all platforms:

# Download and verify every dependency
secure_download "$(get_openssl_url)" "$OPENSSL_TARBALL"
verify_checksum "$OPENSSL_TARBALL"

# Build in the correct order
build_openssl_for_platform "$PLATFORM"
build_kerberos_for_platform "$PLATFORM"
build_sasl_for_platform "$PLATFORM"
# ... and so on

The build scripts grew to over 2,000 lines of carefully crafted shell code. Every platform had its own nuances, its own gotchas, its own way of making my life difficult.

The Security Rabbit Hole

Precompiled binaries introduce a fundamental security challenge: trust. When you compile from source, you can theoretically inspect every line of code. With precompiled binaries, you're trusting that the binary contains exactly what it claims to contain.

I spent weeks, implementing a comprehensive security model consisting of:

SHA256 verification for every downloaded dependency
Cryptographic attestation through RubyGems Trusted Publishing
Reproducible builds with the pinned versions
Supply chain protection against malicious dependencies

The build logs became a security audit trail:

[SECURITY] Verifying checksum for openssl-3.0.16.tar.gz...
[SECURITY] ✅ Checksum verified for openssl-3.0.16.tar.gz
[SECURITY] 🔒 SECURITY VERIFICATION COMPLETE

The CI/CD Nightmare

Testing native extensions across multiple platforms and Ruby versions created a combinatorial explosion of complexity. My GitHub Actions configuration grew from a simple test matrix to a multi-stage pipeline with separate build and test phases.

Each platform needed its own runners:

Linux builds ran on Ubuntu with Docker containers
macOS builds ran on actual macOS runners
Each build had to be tested across Ruby 3.1, 3.2, 3.3, 3.4, and 3.5

The CI pipeline became a carefully choreographed dance of builds, tests, and releases. One failure anywhere would cascade through the entire system.

Each platform requires 10 separate CI actions - from compilation to testing across Ruby versions. Multiplied across all supported platforms, this creates a complex 30-action pipeline.

The Release Process Revolution

Publishing native extensions isn't just gem push. Each release now involves building on multiple platforms simultaneously, testing each binary across Ruby versions, and coordinating the release of multiple platform-specific gems.

I implemented RubyGems Trusted Publishing, which uses cryptographic tokens instead of API keys. This meant rebuilding the entire release process from scratch, but it provided better security and audit trails.

The First Success

After months of work, I finally had working native extensions for all three major platforms. The moment of truth came, when I installed the gem for the first time using the precompiled binary:

$ gem install rdkafka
Successfully installed rdkafka-0.22.0-x86_64-linux-gnu
1 gem installed

I sat there staring at my terminal, hardly believing it had worked. Months of frustration, debugging, and near misses had led to this moment. Three seconds instead of sixty!

The Ripple Effect

The impact is beyond just faster installations. The Docker builds that previously had taken several minutes, now completed much faster. The CI pipelines that developers had learned to ignore suddenly became responsive. New contributors could set up the development environments without fighting the compiler errors.

But the numbers that really struck me were the environmental ones. With over a million downloads per month, those 60 seconds of compilation time added up to 60 million seconds of CPU usage monthly. That's 16,667 hours of processing power - and all the associated energy consumption and CO2 emissions.

The Unexpected Challenges

Just when I thought I was done, new challenges emerged. Some users needed to stick with source compilation for custom configurations. Others wanted to verify that the precompiled binaries were truly equivalent to the source builds.

I added fallback mechanisms and comprehensive documentation. You can still force source compilation if needed:

gem 'rdkafka', force_ruby_platform: true

But for most users, the native extensions work transparently. Install the gem, and you automatically get the fastest experience possible.

Looking Back

This project has taught me, that sometimes the most valuable improvements are the ones that the users never notice. Nobody celebrates faster gem installation. There are no awards for reducing the compilation time. But those small improvements compound into something much larger.

Every developer who doesn't wait for rdkafka to compile can focus on building something amazing instead. Every CI pipeline that completes faster means more iterations, more experiments, more innovation.

The 60-second problem is solved. It took months of engineering effort, thousands of lines of code, and more debugging sessions than I care to count. But now, when you run gem install rdkafka or bundle install, it just works.

Fast.

The rdkafka and karafka-rdkafka gems with native extensions are available now. Your next bundle install will be faster than ever. For complete documentation, visit the Karafka documentation.

Breaking the Rules: RPC Pattern with Apache Kafka and Karafka

January 26, 2025 / Maciej Mensfeld

Introduction

Using Kafka for Remote Procedure Calls (RPC) might raise eyebrows among seasoned developers. At its core, RPC is a programming technique that creates the illusion of running a function on a local machine when it executes on a remote server. When you make an RPC call, your application sends a request to a remote service, waits for it to execute some code, and then receives the results - all while making it feel like a regular function call in your code.

Apache Kafka, however, was designed as an event log, optimizing for throughput over latency. Yet, sometimes unconventional approaches yield surprising results. This article explores implementing RPC patterns with Kafka using the Karafka framework. While this approach might seem controversial - and rightfully so - understanding its implementation, performance characteristics, and limitations may provide valuable insights into Kafka's capabilities and distributed system design.

The idea emerged from discussing synchronous communication in event-driven architectures. What started as a theoretical question - "Could we implement RPC with Kafka?" - evolved into a working proof of concept that achieved millisecond response times in local testing.

In modern distributed systems, the default response to new requirements often involves adding another specialized tool to the technology stack. However, this approach comes with its own costs:

Increased operational complexity,
Additional maintenance overhead,
And more potential points of failure.

Sometimes, stretching the capabilities of existing infrastructure - even in unconventional ways - can provide a pragmatic solution that avoids these downsides.

Disclaimer: This implementation serves as a proof-of-concept and learning resource. While functional, it lacks production-ready features like proper timeout handling, resource cleanup after timeouts, error propagation, retries, message validation, security measures, and proper metrics/monitoring. The implementation also doesn't handle edge cases like Kafka cluster failures. Use this as a starting point to build a more robust solution.

Architecture Overview

Building an RPC pattern on top of Kafka requires careful consideration of both synchronous and asynchronous aspects of communication. At its core, we're creating a synchronous-feeling operation by orchestrating asynchronous message flows underneath. From the client's perspective, making an RPC call should feel synchronous - send a request and wait for a response. However, once a command enters Kafka, all the underlying operations are asynchronous.

Core Components

Such an architecture has to rely on several key components working together:

Two Kafka topics form the backbone - a command topic for requests and a result topic for responses.
A client-side consumer, running without a consumer group, that actively matches correlation IDs and starts from the latest offset to ensure we only process relevant messages.
The commands consumer in our RPC server that processes requests and publishes results
A synchronization mechanism using mutexes and condition variables that maintain thread safety and handles concurrent requests.

Implementation Flow

A unique correlation ID is always generated when a client initiates an RPC call. The command is then published to Kafka, where it's processed asynchronously. The client blocks execution using a mutex and condition variable while waiting for the response. Meanwhile, the message flows through several stages:

command topic persistence,
consumer polling and processing,
result publishing,
result topic persistence,
and finally, the client-side consumer matching of the correlation ID with the response and completion signaling,

Below, you can find a visual representation of the RPC flow over Kafka. The diagram shows the journey of a single request-response cycle:

Design Considerations

This architecture makes several conscious trade-offs. We use single-partition topics to ensure strict ordering, which limits throughput but simplifies correlation and provides exactly-once processing semantics - though the partition count and other things could be adjusted if higher scale becomes necessary. The custom consumer approach avoids consumer group rebalancing delays, while the synchronization mechanism bridges the gap between Kafka's asynchronous nature and our desired synchronous behavior. While this design prioritizes correctness over maximum throughput, it aligns well with typical RPC use cases where reliability and simplicity are key requirements.

Implementation Components

Getting from concept to working code requires several key components to work together. Let's examine the implementation of our RPC pattern with Kafka.

Topic Configuration

First, we need to define our topics. We use a single-partition configuration to maintain message ordering:

topic :commands do
  config(partitions: 1)
  consumer CommandsConsumer
end

topic :commands_results do
  config(partitions: 1)
  active false
end

This configuration defines two essential topics:

Command topic that receives and processes RPC requests
Results topic marked as inactive since we'll use a custom iterator instead of a standard consumer group consumer

Command Consumer

The consumer handles incoming commands and publishes results back to the results topic:

class CommandsConsumer < ApplicationConsumer
  def consume
    messages.each do |message|
      Karafka.producer.produce_async(
        topic: 'commands_results',
        # We evaluate whatever Ruby code comes in the payload
        # We return stringified result of evaluation
        payload: eval(message.raw_payload).to_s,
        key: message.key
      )

      mark_as_consumed(message)
    end
  end
end

We're using a simple eval to process commands for demonstration purposes. You'd want to implement proper command validation, deserialization, and secure processing logic in production.

Synchronization Mechanism

To bridge Kafka's asynchronous nature with synchronous RPC behavior, we implement a synchronization mechanism using Ruby's mutex and condition variables:

class Accu
  include Singleton

  def initialize
    @running = {}
    @results = {}
  end

  def register(id)
    @running[id] = [Mutex.new, ConditionVariable.new]
  end

  def unlock(id, result)
    return false unless @running.key?(id)

    @results[id] = result
    mutex, cond = @running.delete(id)
    mutex.synchronize { cond.signal }
  end

  def result(id)
    @results.delete(id)
  end
end

This mechanism maintains a registry of pending requests and coordinates the blocking and unblocking of client threads based on correlation IDs. When a response arrives, it signals the corresponding condition variable to unblock the waiting thread.

The Client

Our client implementation brings everything together with two main components:

A response listener that continuously checks for matching results
A blocking command dispatcher that waits for responses

class Client
  class << self
    def run
      iterator = Karafka::Pro::Iterator.new(
        { 'commands_results' => true },
        settings: {
          'bootstrap.servers': '127.0.0.1:9092',
          'enable.partition.eof': false,
          'auto.offset.reset': 'latest'
        },
        yield_nil: true,
        max_wait_time: 100
      )

      iterator.each do |message|
        next unless message

        Accu.instance.unlock(message.key, message.raw_payload)
      rescue StandardError => e
        puts e
        sleep(rand)
        next
      end
   end

   def perform(ruby_remote_code)
      cmd_id = SecureRandom.uuid

      Karafka.producer.produce_sync(
        topic: 'commands',
        payload: ruby_remote_code,
        key: cmd_id
      )

      mutex, cond = Accu.instance.register(cmd_id)
      mutex.synchronize { cond.wait(mutex) }

      Accu.instance.result(cmd_id)
    end
  end
end

The client uses Karafka's Iterator to consume responses without joining a consumer group, which avoids rebalancing delays and ensures we only process new messages. The perform method handles the synchronous aspects:

Generates a unique correlation ID
Registers the request with our synchronization mechanism
Sends the command
Blocks until the response arrives

Using the Implementation

To use this RPC implementation, first start the response listener in a background thread:

# Do this only once per process
Thread.new { Client.run }

Then, you can make synchronous RPC calls from your application:

Client.perform('1 + 1')
#=> Remote result: 2

Each call blocks until the response arrives, making it feel like a regular synchronous method call despite the underlying asynchronous message flow.

Despite its simplicity, this implementation achieves impressive performance in local testing - roundtrip times as low as 3ms. However, remember this assumes ideal conditions and minimal command processing time. Real-world usage would need additional error handling, timeouts, and more robust command processing logic.

Performance Considerations

The performance characteristics of this RPC implementation are surprisingly good, but they come with important caveats and considerations that need to be understood for proper usage.

Local Testing Results

In our local testing environment, the implementation showed impressive numbers.

A single roundtrip can be completed in as little as 3ms. Even when executing 100 sequential commands:

require 'benchmark'

Benchmark.measure do
  100.times { Client.perform('1 + 1') }
end
#=> 0.035734   0.011570   0.047304 (  0.316631)

However, it's crucial to understand that these numbers represent ideal conditions:

Local Kafka cluster
Minimal command processing time
No network latency
No concurrent load

Summary

While Kafka wasn't designed for RPC patterns, this implementation demonstrates that with careful consideration and proper use of Karafka's features, we can build reliable request-response patterns on top of it. The approach shines particularly in environments where Kafka is already a central infrastructure, allowing messaging architecture to be extended without introducing additional technologies.

However, this isn't a silver bullet solution. Success with this pattern requires careful attention to timeouts, error handling, and monitoring. It works best when Kafka is already part of your stack, and your use case can tolerate slightly higher latencies than traditional RPC solutions.

This fascinating pattern challenges our preconceptions about messaging systems and RPC. It demonstrates that understanding your tools deeply often reveals capabilities beyond their primary use cases. While unsuitable for every situation, it provides a pragmatic alternative when adding new infrastructure components isn't desirable.