OSO has deployed its fair share of Kafka Streams rebalances applications, rebalances and assignments are part and parcel of this architecture. We will explain the reasons behind rebalances, the process of rebalancing, and how to optimise rebalances for better performance. So let’s get started!
Understanding why Kafka Streams rebalances
When a rebalance occurs in Kafka Streams, it is usually triggered by a specific event or condition. One common reason for rebalancing is a scheduled probing rebalance. This type of rebalance is initiated at regular intervals to ensure the cluster remains balanced and optimised. During a probing rebalance, the assignment of tasks may not change significantly, but it helps maintain the overall balance of the cluster.
Another reason for rebalancing is when a consumer instance, let’s call it B, falls behind in processing tasks. In this case, a rebalance is scheduled to allow B to catch up and resume processing. Once B is ready, the rebalance process begins.
The Kafka Streams rebalance process
During a rebalance, tasks may change ownership from one consumer instance to another. In the past, rebalances would cause all processing to stop until the rebalance was complete. However, with the incremental cooperative rebalancing protocol, processing can continue during a rebalance unless a specific task needs to be swapped between instances.
When a rebalance occurs, the instance that needs to give up a task will flush its state stores, flush any buffers, and commit the task. This triggers a follow-up rebalance, allowing the other instance to start processing the task. This handoff between instances ensures a smooth transition and minimal downtime.
The rebalance process may involve multiple rebalances, depending on the number of tasks and the time it takes for instances to close out tasks. However, once a stable assignment is produced with no follow-up rebalances, it indicates that the rebalance process is complete and the cluster has converged.
How to optimise Kafka Streams rebalances
While rebalances are necessary for maintaining a balanced and optimised cluster, it’s important to minimise unnecessary and unwanted rebalances. One way to achieve this is by tuning the consumer configurations. By setting parameters such as max.polling.interval, heartbeat.interval, and session.timeout to larger values, the sensitivity to detecting crashed consumers can be reduced, thereby preventing unnecessary rebalances.
In scenarios where a large number of tasks need to be migrated, such as during a probing rebalance phase, the duration of rebalances can be reduced by increasing the number of warm-up replicas and reducing the warm-up time for each task. This allows for higher throughput from the broker and decreases the time required for each task to warm up.
Monitoring rebalances
To understand what’s happening during a rebalance, there are several key metrics to monitor. These metrics include:
Last poll seconds ago: This metric indicates the time since the last poll by the consumer. By comparing this value to the max poll interval, it can be determined if the consumer is polling in a timely manner. A beginners guide to this config can be found here
Heartbeat interval and session timeout: By monitoring the heartbeat interval and session timeout, it can be determined if all heartbeats are sent on schedule and if a heartbeat is close to the session timeout or partition alignment. This can indicate if a rebalance is happening simultaneously. More information here
Last rebalance seconds ago: This metric shows the time since the last rebalance. By comparing this value to the probing rebalance interval, it can be determined if there have been recent rebalances. It is expected that this value continuously increases, indicating a stable assignment.
By monitoring these metrics, it becomes easier to identify rebalance events and understand the state of the cluster during rebalancing.
Important take aways
Rebalances are an essential part of maintaining a balanced and optimised Kafka Streams cluster. By understanding the reasons for rebalances, optimising rebalances, and monitoring key metrics, it becomes easier to manage rebalances and ensure smooth operation of the cluster.
Remember, rebalances are not something to be feared. They are a mechanism for maintaining the health and efficiency of the cluster. By understanding and managing rebalances effectively, you can ensure the smooth operation of your Kafka Streams applications.
This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
Strictly Necessary Cookies
Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.
If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.
3rd Party Cookies
This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.
Keeping this cookie enabled helps us to improve our website.
Please enable Strictly Necessary Cookies first so that we can save your preferences!