Page 41 of 170

Benchmarking Karafka – how does it handle multiple TCP connections

Recently I've released a Ruby Apache Kafka microframework, however I don't expect anyone to use it without at least a bit information on what it can do. Here are some measurements that I took.

How Karafka handles multiple TCP connections

Since listening to multiple topics require multiple TCP connections it is pretty obvious that in order to obtain a decent performance, we are using threads (process clustering feature is in progress). Each controller that you create theoretically could have a single thread and could listen all the time. However with a bigger application, it could slow down the application. That's why we introduced topics clusterization. When you config your Karafka application, you should specify the concurrency parameter:

class App < Karafka::App
  setup do |config|
    # Other config options
    config.max_concurrency = 10 # 10 threads max
  end
end

This is a maximum number of threads that will be used to listen for incoming messages. It is pretty simple when you have less controllers (topics) than threads - it will just use a single thread per topic. However if you have more controllers then threads - few connections will be packed in a single thread (wrapped with Karafka::Connection::ThreadCluster). And this is how it works when you have 2 threads and 4 controllers:

clusters

In general, it will distribute TCP connections across threads evenly. So, if you have 20 controllers and 5 threads, each thread will be responsible for checking 4 sockets, one after another. Since it won't do this simultaneously, Karafka will slow down. How much? It depends - if there's something on each of the topics - you will get around 24% (per controller) of the base performance out of each connection.

Other things that have impact on the performance

When considering this framework's performance, you need to keep in mind that:

  • It is strongly dependent on what you do in your code
  • It depends also on Apache Kafka performance
  • Connection between Karafka and Redis (for Sidekiq) is a factor as well
  • All the benchmarks show the performance without any business logic
  • All the benchmarks show the performance without enqueuing to Sidekiq
  • It also depends on what type of infrastructure you benchmark everything
  • Message size is a factor as well (since it get deserialized to JSON by default)
  • Ruby version - I've been testing in on MRI (CRuby) 2.2.3 - Karafka is not yet working with other Ruby distributions (JRuby or Rubinius) but it should change when some of the dependencies stop using refinements

Benchmarking

Methodology

For each of the benchmarks I was measuring time taken to consume all messages that were stored in Kafka. There were no business logic involved (just messages processing by the framework). My local Kafka setup was a default setup (no settings were changed) introduced with this Docker containers.

I've tested up to 5 topics - each with 1 000 000 messages loaded. Since Karafka has lazy loading for params - benchmark does not include time that is needed to unparse the messages. Unparsing performance strongly depends on a parser you pick (defaults to JSON) and messages size. Those benchmarks measure maximum throughput that we can get during messaging receiving.

Note: all the benchmarking was performed on my 16GB, 4 core i7 processor, Linux laptop. During the benchmarking I've been performing other tasks that might have small impact on overall results (although  no heavy stuff).

1 thread

With a single thread it is pretty straightforward - the more controllers we have, the less we can process per controller. There's also controllers context switching overhead that consumes some of the power, allowing us to consume less and less. Switching between controllers seems to consume around 11% of a single controller performance when we tend to use more than 1 controller in a single threaded application.

Zrzut ekranu z 2015-11-02 17:50:46
Context switching between controllers in a single thread will cost us around 1% of a general performance per one additional controller (if you're eager to know what we're planning to do with it scroll down to the summary). On one side it is a lot, on the other, with a bigger application you should probably run Karafka in multithreaded mode.. That way context switching won't be as painful.

2 threads

Zrzut ekranu z 2015-11-02 18:12:37
General performance with 2 threads and 2 controllers proves that we're able to lower switching impact on a overall performance, gaining around 1.5-2k requests per second (overall).

3 threads

Zrzut ekranu z 2015-11-02 18:23:13
5 controllers with 3 threads vs 5 controllers with 1 thread: 7% better performance.

4 threads

Zrzut ekranu z 2015-11-02 18:32:40

5 threads

Zrzut ekranu z 2015-11-02 18:33:33

Benchmarks results

Summary

The overall performance of a single Karafka framework process is highly dependent on the way it is being used. Because of GIL, when we receive data from sockets, we can only process incoming messages from a single socket at a time. So in general we're limited to around 30-33k requests per second per process. It means that the bigger the application gets, the slower it works (when we consider total performance per single controller). However this is only valid when we assume that all the topics are always full of messages. Otherwise we don't process, we wait on the IO and Ruby can process incoming messages from multiple threads. That's why it is worth starting Karafka with a decent concurrency level.

How can we increase throughput for Karafka applications? Well for now, we can create multiple partitions for a single topic and spin up multiple Karafka processes. Then they will load balance between partitions automatically. This solution has one downside: if we have only few topics with multiple partitions and rest with a single one, then some of the threads in Karafka won't perform any work. This will be fixed soon (we're already working on it), when we will introduce a Karafka processes clustering. It will allow to spin up multiple Karafka processes (in a single cluster) that will listen only for a given part of controllers. That way the overall performance will increase significantly. But still being able to perform 30k rq/s is not that bad, right? ;)

EuRuKo 2015 Review – conference in a nutshell

EuRuKo has just ended. After 7 hours of driving we're finally home. I guess, it’s a good time to summarize and review the last days on one of the best Ruby conferences in Europe.

Arriving in Salzburg

Before we jump into talks/lightning talks, let me start with Salzburg: what an amazing town! No wonder Humboldt considered it one of the most beautiful in the world next to such pearls like Neaples. I wish we had arrived 1-2 days earlier. We would spend a lot more time sightseeing and enjoying really good Austrian cuisine.

For those who don't know, Salzburg is the fourth-largest (148,420) city in Austria and the capital of the federal state of Salzburg.

It takes about 7-8 hours to get there from Cracow (by car). Since Austria has really good highways, driving there was a pretty nice experience. Salzburg's old town is internationally renowned for its baroque architecture and is one of the best-preserved city centers north of the Alps. The only thing that I would change about it was the temperature and humidity ;). If you weren't there during EuRuKo it is still worth visiting.

DSC_3996DSC_4071

Salzburg Congress building and surroundings

EuRuKo took place at the Salzburg Congress building at Auerspergstraße 6. This congress center is located near the old town, so during the lunch break, there was enough time to try out some local cuisine (and to do a bit of sightseeing). I can't say a bad thing about the location and general organization. Everything was almost on time, the stuff was friendly, there was enough coffee and sweets, so there's almost nothing I can complain about. But, there are still two "buts":

  • Not enough Internet - the internet connection was really, really bad (pings up to 5000 ms). I can imagine that it is not super-easy to prepare infrastructure for so many IT people, however organizers should just assume 2 WIFI devices per person and organize everything, keeping that in the backs of their heads.
  • Not enough power cords - same as the one above. I know that we all have laptopts that run 6-10 hours on a battery, however there are clumsy people like me, that forget to charge it beforehand ;)

DSC_3537

Day 1 Speeches

Matz - Keynote

What can you say about Matz presentations if not: "really good". He spoke about some interesting stuff that the Ruby Core team is planning to add to Ruby. He also said that despite the fact that everyone knows that we need better concurrency and parallelism  solutions, it is yet unclear what kind of approach Ruby Core team will take. @cyberpoet summarized this speech really well:

Tl;dr #matzkeynote Ruby concurrency needs to get better. We have looked at how others do it. We have no plan yet. #euruko

Joseph Wilk - Programming as a Performance

A totally different point of view on a programming. Joseph showed (and proved by developing music live) that programming can become a key part of an art performance. Did I think about programming by dancing before what I saw? Never. Do I consider that now? Definitely!

Bradley Grzesiak: Simplify Challenging Software Problems with Rocket Science

I often hear from my programming friends this sentence:

Oh come on, it can be done. It's not rocket science.

But this time it was. It turns out that "doing" rocket science is not that different from "doing" programs and programming. Bradley showed many techniques and tools that have been developed for aerospace engineers to help them solve problems. It was really inspiring to learn that their tools and a way of working does not differ that much from what the software developers use to simplify and automate their work.

Satoshi Tagomori - Data Analytics Service Company and Its Ruby Usage

IMHO the most interesting speech during the whole conference. I would just change its title or remove "Ruby" out of it. "Ruby" wasn't the main topic. It was more about the infrastructure, open source software and other solutions that Satoshi uses in Treasure Data company. Sure, he mentioned that they use MRI (CRuby) and JRuby here and there, but many people that I've talked during the after-party expected more Ruby vs Data Analytics things.

Zrzut ekranu z 2015-10-19 14:13:21

It was really, really interesting to hear from Satoshi how they structured (on a high level) their whole system to handle 50 bilion new records per day, how they are able to sustain the whole infrastructure and what are the key components of their software stack. It was also really inspiring to see a company that is based mostly on OSS solutions. Big plus for Big Data ;)

Lydia Krupp-Hunter - Ruby Game Building Throwdown

I won't lie: I didn't attend this one - Salzburg won.

Hanneli Tavante - Humanising Math and Physics on Computer Science

TL;DR: "I will teach you how to teach". And this is the core of the topic. Hanneli talked about making boring things interesting by combining them with way more interesting things (like programming!). This is a must see for anyone who wants to teach other people. Luckily she did this speech also @ eurocamp 2015 so you can see it here:

René Föhring - One Inch at a Time - How to get people excited about inline docs

The next subject that I strongly identify with because of PolishGeeks Dev Tools. René found a more humane way to encourage programmers to document their code. If you want to see how to do this, just visit the Inch CI website and play with it.

Lightning talks

I don't remember speakers names or subjects but I had one general feeling about the lightning talks: not enough Ruby there. Some advertisement, some funny things but where were you Ruby lang?

Day 2

Koichi Sasada - Lightweight Method Dispatch on MRI

A really interesting talk! It seems that in terms of topics, Japanese speakers totally ruled @ EuRuKo. Speeding up method dispatch by introducing a caching layer can be a really good move. Same goes about improvements added to dispatching methods with keyword arguments. The only thing that worries me is a lack of statistical research on how the whole community uses Ruby. I feel that RDoc and internal Ruby usage inside Ruby development team might not be enough. Maybe if we would look deeper into how people use Ruby, the Ruby Core team could discover more places that could benefit from extra optimization.

Richard Huang - Refactor ruby code based on AST

A super useful talk, too bad that I knew nothing about this during the Rails 2 -> Rails 3 migration. Automatic refactoring based on the AST (abstract syntax tree) will be definitely used by me in case of any bigger gems/libraries API changes.

Michał Taszycki - Learn to program Commodore 64 this year!

I didn't attend it.

Tworit Kumar Dash - RFID Technology with Internet of Things

Well to be honest, the RFID Technology is not exactly what makes me exited but I must say: Tworit had a really interesting speech and he showed how he uses Ruby together with RFID to collect really interesting data and connect the low level hardware world with higher level abstractions.

Amy Wibowo - Fold, paper, scissors - an exploration of origiami's cut and fold problem

Did you know about fold and cut theorem? It states that it is possible, given a piece of paper and any polygonal shape, to find a series of folds of that paper such that the given shape can be generated with a single cut. Some say that whatever you can imaging, there's probably already someone doing that. And for fold and cut theorem there's Amy! She's not only doing that, but she's also doing that using Ruby. Despite the fact that it is a scientific theorem - for me, when Amy demonstrated this, still felt like kind of magic ;)

Simon Eskildsen - Super-Reliable Software

For me, the main idea of this presentation is: Everything can and will fail. It is just a matter of time. However, in software development it is more about trying to figure out all the points of failure and making sure that when something bad happens, it won't have a critical impact on the whole infrastructure. Simon told us how they deal with serious problems with parts of their infrastructure and how we can try to predict (and test) various scenarios including those with the key components failing.

 

Zrzut ekranu z 2015-10-19 16:18:41

Christopher Rigor - Cryptography for Rails Developers

I didn't attend it - had to get back to Poland :(

Summary

More than decent but not perfect. It's another Ruby conference that leaves me with mixed feelings because of many speeches not related to Ruby. Does it mean that there's nothing really interesting going on that is directly related to Ruby? I don't think so. Does it mean that people expect less technical talks? Maybe. But is that a good enough reason to pick so many (good) talks that are only slightly related to Ruby on a Ruby conference?

Copyright © 2025 Closer to Code

Theme by Anders NorenUp ↑