blog by OSO

The new consumer rebalance protocol KIP-848

Sion Smith 17 July 2023

The consumer rebalance protocol is an important aspect of Apache Kafka that ensures the fair distribution of partitions among consumer group members. This group membership and rebalancing through Apache Kafka has been around for over 7 years, KIP-848 aims at improving this further, in this article, we will explore what the next generation of the consumer rebalance protocol is and the key changes it brings. 

Changes proposed by KIP-848
consumer rebalance protocol

Consumer Rebalance Protocol: Session Timeout and Heartbeat Interval

In the new protocol, the session timeout is defined on the server side. This means that the server determines the duration of the session timeout, which is the period of time a member can be inactive before being kicked out of the group. The heartbeat interval, which is provided by the server, is used by members to send periodic heartbeats to the coordinator and indicate their liveness.

Heartbeat API

See the API documentation here for more information

Consumer Rebalance Protocol: Assignment and Wildcard Subscription

The assignment, which specifies the partitions that each member should consume, is now also defined on the server side. The server informs each member about the partitions it should consume at any given moment. Additionally, the server is responsible for resolving wildcard subscriptions and regular expressions. To ensure safety, the new protocol standardised the use of a library that provides time resolution and ensures the safety of using user-provided regular expressions.

Moving from Incremental to Declarative Assignment

The new protocol introduces a shift from incremental assignment computation to declarative assignment on the server side. In this model, the server is fully responsible for defining the desired state of the group and instructing members on how to achieve it. The group coordinator plays a crucial role in reconciling the members towards the desired state.

consumer rebalance protocol 2

Reconciliation Process

The reconciliation process is a key concept in the new protocol. The group coordinator constructs a reconciliation loop where it compares the current state of each member with the desired state. It takes steps to converge the members towards the desired state and ensures synchronisation between them. This process involves a three-step approach for each member: revoking partitions, transitioning to the new epoch, and receiving new partitions.

Partition Assignment Strategies

The new protocol provides two built-in partition assignment strategies: The Uniform Assignor and The Range Assignor. The uniform assignor spreads partitions as widely as possible among consumer group members. On the other hand, the range assignor avoids co-partitioning partitions of the same topic together. These strategies cater to different use cases, but users also have the flexibility to implement their own custom logic using the public API.

Reconciliation and Reverse Replication Timeout

During the reconciliation process, partitions may need to be moved from one member to another. To avoid waiting indefinitely for a member to remove partitions, the protocol introduces the concept of reverse replication timeout. The coordinator sets a deadline for members to remove partitions, and if they fail to do so, they are kicked out from the group.

Example of Reconciliation Process

Let’s walk through an example to understand how the reconciliation process works. Suppose we have a group with two members, A and B, each assigned three partitions. The group coordinator keeps track of the state of the members and the desired assignment for the group in this example. When a new member, C, joins the group, it triggers a new assignment. The new member is assigned a partition that is currently owned by another member and cannot be immediately assigned to it. The member transitions to a waiting state until the partition becomes available.

During the reconciliation process, the group coordinator compares the desired assignment with the current state of each member. It identifies partitions that need to be revoked and notifies the members to release those partitions. The members then commit offsets and start the revocation process. Once a member acknowledges the revocation of a partition, the coordinator reassigns it to another member.

The reconciliation process continues until all members have fully reconciled and received their assigned partitions. Unlike the previous protocol, the new protocol does not wait for slow members to assign partitions. The impact of slow or faulty members is contained to the partitions owned by those members only.

Extended Protocol and Delegated Assignment Computation

The extended protocol introduces a mechanism to delegate the assignment computation to one of the members in the group. The group coordinator selects a member to compute the assignment and sets a flag to indicate that the assignment should be computed by that member. The selected member then calls a new API, the paper assignment API, to retrieve the necessary group state and metadata for assignment computation. Once the computation is done, the member calls the instance assignment API to start the assignment on the server side.

How to upgrade to the KIP-828 New Protocol

To upgrade to the new protocol, the brokers need to be upgraded with the software supporting the new version. Once all brokers are upgraded, the metadata version can be bumped to enable the new features. There are also new broker configurations, such as the group coordinator threads, which allow users to choose the number of threads for the new event loop style of the group coordinator.


Takeaways of KIP-828

The next generation of the consumer rebalance protocol brings significant changes to the way partition assignments are handled in Apache Kafka. The shift from incremental to declarative assignment computation on the server side, the introduction of the reconciliation process, and the ability to delegate assignment computation are key features of the new protocol. These changes ensure a fair distribution of partitions among consumer group members and improve the overall efficiency and reliability of the consumer rebalance process.

The new protocol also provides built-in partition assignment strategies, such as the uniform assignor and the range assignor, and allows users to implement their own custom logic using the public API. This flexibility caters to different use cases and ensures that users can tailor the partition assignment process to their specific needs.

In conclusion, the next generation of the consumer rebalance protocol brings significant improvements to the partition assignment process in Apache Kafka. The shift to declarative assignment computation, the introduction of the reconciliation process, and the ability to delegate assignment computation enhance the fairness, efficiency, and reliability of the consumer rebalance process. These changes ensure that Kafka consumers can handle large-scale deployments and meet the demands of modern data processing applications.

Fore more content:

How to take your Kafka projects to the next level with a Confluent preferred partner

Event driven Architecture: A Simple Guide

Watch Our Kafka Summit Talk: Offering Kafka as a Service in Your Organisation

Successfully Reduce AWS Costs: 4 Powerful Ways

Protecting Kafka Cluster

Apache Kafka Common Mistakes

Kafka Cruise Control 101

Kafka performance best practices for monitoring and alerting

Real-time Push APIs Using Kafka 

Get started with OSO professional services for Apache Kafka

Have a conversation with a Kafka expert to discover how we help your adopt of Apache Kafka in your business.

Contact Us