Load balancing is a critical aspect of optimising performance and cost efficiency in Kafka clusters. It ensures that all nodes are utilised effectively and data is evenly distributed across the cluster. In this blog article, we will focus on load balancing specifically for data injection use cases in Kafka clusters. By leveraging a partition replica placement strategy, you can achieve load balancing with minimal overhead.
Understanding Load Balancing in Kafka Clusters
Load balancing in Kafka clusters depends on two factors: Kafka partition placement and the Kafka partition access pattern. The placement of Kafka partition replicas on brokers can affect load balancing, considering different topics may have diverse retention requirements. Additionally, the traffic volume generated by producers and consumed by consumers does impact load balancing, as some partitions may receive more data at different times.
The Common Solution: Continuous Rebalancing
Currently, the most common approach for load balancing in Kafka clusters is continuous delivery or what Confluent calls auto-balancing. This method balances the cluster based on load metrics. It collects load metrics from Kafka brokers, computes the Kafka cluster load model, generates an optimization proposal, and executes it. However, this approach comes with overheads such as data movement, longer execution times, and increased infrastructure costs.
Load Balancing in Data Injection Use Cases
Data injection is a popular use case for Kafka clusters, involving the injection of log data from servers into the cluster for later analysis in a data warehouse. In analysing this specific use case category, we have observed some interesting workload patterns:
- Random assignment of events to partitions within a single topic.
- Constant evaluation of partition data by consumers.
- No strict requirements for the partition count by producers and consumers.
Based on these observations, we propose a new partition replica placement strategy that significantly improves load balancing in data injection use cases while minimising overhead.
The Partition Replica Placement Strategy
The partition replica placement strategy for data injection use cases is as follows:
- Assign a SKU number to each topic. The SKU number, along with the number of brokers in the cluster, determines the number of partitions for the topic.
- Ensure that each broker has an equal number of partition replicas and leader replicas for each topic, distributing the load evenly among the brokers.
- Aggregate all topics together, each broker naturally has an equivalent load compared to its peers, resulting in balanced hardware utilisation for CPU, storage, and network usage.
Scenarios for Maintaining Load Balancing in Data Injection Use Cases
To ensure load balancing in production, you need to address various scenarios:
- New topic creation: When onboarding a new topic, generate a partition and replica assignment that ensures an even distribution across all brokers.
- Increased partition count: If consumers require more partitions, simply change the SKU number to determine the new partition count and create a new partition assignment for the topic.
- Adding more brokers: If the cluster load increases over time, add more brokers and create new partitions on them to distribute the load evenly.
- Injection traffic volume or retention changes: These changes do not affect load balancing because the load is already evenly distributed among all brokers for each topic.
- Removing brokers: Removing brokers requires data movement, which can be managed by either migrating to a different Kafka cluster or creating new partitions on the remaining brokers to maintain load balancing.
Achieving Load Balancing in Production
In a production environment, to successfully implement the partition replica placement strategy to achieve load balancing without relying on additional tools like Cruise Control you need to scale up our cluster, support various operations such as topic onboarding, increase partition counts, retain changes, and add more brokers, all while maintaining a balanced cluster.
Load balancing in Kafka clusters is a complex challenge, but by focusing on specific use case categories like data injection, we can leverage partition placement strategies to greatly improve load balancing with minimal overhead. With this approach, we have achieved a balanced cluster, supported various operations, and scaled up our Kafka infrastructure. Optimal load balancing is crucial for achieving peak performance and cost efficiency in Kafka clusters, and our findings can assist other Kafka users in attaining the same level of balance in their clusters.
Consult the experts
Changing configuration in production can be daunting. If you are unsure about how to achieve load balancing in production, reach out to us and we will be happy to help.
Fore more content:
How to take your Kafka projects to the next level with a Confluent preferred partner
Event driven Architecture: A Simple Guide
Watch Our Kafka Summit Talk: Offering Kafka as a Service in Your Organisation
Successfully Reduce AWS Costs: 4 Powerful Ways
Protecting Kafka Cluster
Apache Kafka Common Mistakes
Kafka Cruise Control 101
Kafka performance best practices for monitoring and alerting
How to build a custom Kafka Streams Statestores
How to avoid configuration drift across multiple Kafka environments using GitOps