blog by OSO

Kafka Streams Rebalances and Assignments 101

Sion Smith 7 August 2023

Blogs 8 mins read

OSO has deployed its fair share of Kafka Streams rebalances applications, rebalances and assignments are part and parcel of this architecture. We will explain the reasons behind rebalances, the process of rebalancing, and how to optimise rebalances for better performance. So let’s get started!

Understanding why Kafka Streams rebalances

When a rebalance occurs in Kafka Streams, it is usually triggered by a specific event or condition. One common reason for rebalancing is a scheduled probing rebalance. This type of rebalance is initiated at regular intervals to ensure the cluster remains balanced and optimised. During a probing rebalance, the assignment of tasks may not change significantly, but it helps maintain the overall balance of the cluster.

Another reason for rebalancing is when a consumer instance, let’s call it B, falls behind in processing tasks. In this case, a rebalance is scheduled to allow B to catch up and resume processing. Once B is ready, the rebalance process begins.

The Kafka Streams rebalance process

During a rebalance, tasks may change ownership from one consumer instance to another. In the past, rebalances would cause all processing to stop until the rebalance was complete. However, with the incremental cooperative rebalancing protocol, processing can continue during a rebalance unless a specific task needs to be swapped between instances.

When a rebalance occurs, the instance that needs to give up a task will flush its state stores, flush any buffers, and commit the task. This triggers a follow-up rebalance, allowing the other instance to start processing the task. This handoff between instances ensures a smooth transition and minimal downtime.

The rebalance process may involve multiple rebalances, depending on the number of tasks and the time it takes for instances to close out tasks. However, once a stable assignment is produced with no follow-up rebalances, it indicates that the rebalance process is complete and the cluster has converged.

How to optimise Kafka Streams rebalances

While rebalances are necessary for maintaining a balanced and optimised cluster, it’s important to minimise unnecessary and unwanted rebalances. One way to achieve this is by tuning the consumer configurations. By setting parameters such as max.polling.interval, heartbeat.interval, and session.timeout to larger values, the sensitivity to detecting crashed consumers can be reduced, thereby preventing unnecessary rebalances.

In scenarios where a large number of tasks need to be migrated, such as during a probing rebalance phase, the duration of rebalances can be reduced by increasing the number of warm-up replicas and reducing the warm-up time for each task. This allows for higher throughput from the broker and decreases the time required for each task to warm up.

Monitoring rebalances

To understand what’s happening during a rebalance, there are several key metrics to monitor. These metrics include:

Last poll seconds ago: This metric indicates the time since the last poll by the consumer. By comparing this value to the max poll interval, it can be determined if the consumer is polling in a timely manner. A beginners guide to this config can be found here
Heartbeat interval and session timeout: By monitoring the heartbeat interval and session timeout, it can be determined if all heartbeats are sent on schedule and if a heartbeat is close to the session timeout or partition alignment. This can indicate if a rebalance is happening simultaneously. More information here
Last rebalance seconds ago: This metric shows the time since the last rebalance. By comparing this value to the probing rebalance interval, it can be determined if there have been recent rebalances. It is expected that this value continuously increases, indicating a stable assignment.

By monitoring these metrics, it becomes easier to identify rebalance events and understand the state of the cluster during rebalancing.

Important take aways

Rebalances are an essential part of maintaining a balanced and optimised Kafka Streams cluster. By understanding the reasons for rebalances, optimising rebalances, and monitoring key metrics, it becomes easier to manage rebalances and ensure smooth operation of the cluster.

Remember, rebalances are not something to be feared. They are a mechanism for maintaining the health and efficiency of the cluster. By understanding and managing rebalances effectively, you can ensure the smooth operation of your Kafka Streams applications.

Fore more content:

How to take your Kafka projects to the next level with a Confluent preferred partner

Event driven Architecture: A Simple Guide

Watch Our Kafka Summit Talk: Offering Kafka as a Service in Your Organisation

Successfully Reduce AWS Costs: 4 Powerful Ways

Protecting Kafka Cluster

Apache Kafka Common Mistakes

Kafka Cruise Control 101

Kafka performance best practices for monitoring and alerting

How to build a custom Kafka Streams Statestores

How to avoid configuration drift across multiple Kafka environments using GitOps

Understanding why Kafka Streams rebalances
The Kafka Streams rebalance process
How to optimise Kafka Streams rebalances
Monitoring rebalances
Important take aways

Get started with OSO professional services for Apache Kafka

Have a conversation with a Kafka expert to discover how we help your adopt of Apache Kafka in your business.

Latest blog posts

See more →

Blogs 5 mins read

How to increase the adopt of Kafka in your team

Sion Smith 3 October 2024

Events 3 mins read

Current 2024: No clear vision for the future of Confluent?

We’re still seeing the classic Kafka training and technical foundation talks at Current, but make no mistake—the overall vision of Confluent needs to evolve.

Sion Smith 1 October 2024

Kafka Streams Rebalances and Assignments 101

Understanding why Kafka Streams rebalances

The Kafka Streams rebalance process

How to optimise Kafka Streams rebalances

Monitoring rebalances

Important take aways

Get started with OSO professional services for Apache Kafka

Latest blog posts

How to increase the adopt of Kafka in your team

Current 2024: No clear vision for the future of Confluent?

Subscription form (footer)