blog by OSO

How to auto scale Apache Kafka with Tiered Storage in Production

Sion Smith 4 July 2025
Auto-scale Apache Kafka with Tiered Storage in Production

In the distributed systems world, Apache Kafka has earned its reputation as the gold standard for scalability. Companies routinely run Kafka clusters with tens or even hundreds of brokers, handling massive throughput with remarkable stability. Yet for all its scalability prowess, Kafka has always struggled with one critical aspect of modern infrastructure: elasticity.

The OSO engineers have been exploring this fundamental challenge, and recent developments in tiered storage are finally making Kafka truly elastic. This isn’t just a theoretical improvement—it’s a practical breakthrough that changes the economics and feasibility of reactive scaling in production environments.

The distinction matters more than you might think. Scalability allows Kafka to grow gracefully as your business expands over months and years. Elasticity enables it to respond to traffic spikes within minutes, then scale back down when demand subsides. For an infrastructure component that often represents one of the largest compute costs in a data platform, this difference translates to significant operational and financial impact.

Example Code

All code for this blog post can be found here: https://github.com/osodevops/kafka-auto-scaling-example

Understanding the Scalability vs Elasticity Paradox

Why Kafka Scales Out But Struggles to Scale Down

Kafka’s scaling success story is well-documented. A startup can begin with three brokers and grow organically to enterprise scale without architectural changes. The fundamental design—topics split into partitions, partitions replicated across brokers—handles this growth beautifully. When you need more capacity, you add brokers, create new topics with more partitions, and distribute the load.

This organic growth model works because it’s planned and gradual. Teams add capacity before they need it, new topics naturally spread across the expanded cluster, and the system grows in harmony with business requirements.

Elasticity presents an entirely different challenge. Consider the hype-driven traffic spike—your product hits the top of Reddit, or a marketing campaign exceeds expectations by 10x. You need capacity immediately, not next week. Traditional scaling approaches fail here because of Kafka’s fundamental architectural constraint: partition replicas have fixed assignments to brokers.

When you scale up a three-broker cluster to four brokers, the new broker starts completely empty. All existing partition replicas remain on the original three brokers, creating no performance benefit. The new broker is essentially idle, consuming resources whilst contributing nothing to throughput.

Scale-down scenarios are even more problematic. If you need to remove a broker that’s actively hosting partition replicas, you face an impossible choice: lose data and availability, or postpone the scaling operation until you can safely migrate the data elsewhere.

The Fixed Partition Assignment Problem

This constraint distinguishes Kafka from stateless applications where scaling simply means adding or removing identical compute instances. Each Kafka broker maintains unique state—specific partition replicas with their committed offsets, consumer group metadata, and transaction logs. You can’t simply terminate a broker like you would terminate a web server instance.

The solution requires data movement, and that’s where the complexity begins. To make a new broker useful, you must rebalance partition replicas across the cluster. To safely remove a broker, you must first migrate its data to remaining brokers. Both operations are resource-intensive and time-consuming.

The Hidden Costs of Traditional Kafka Scaling

When Auto-scaling Becomes Auto-expensive

The OSO engineers have observed the resource implications of traditional Kafka scaling across numerous enterprise deployments. The numbers are sobering. Consider a modest example: scaling from three brokers to four, where each original broker hosts 4TB of data. The rebalancing operation must move approximately 3TB from existing brokers to the new broker to achieve balance.

This data movement consumes substantial network bandwidth. In cloud environments, inter-instance network transfer might be free, but the bandwidth itself becomes a bottleneck. The operation requires significant disk I/O as source brokers read data from storage whilst the destination broker writes it. CPU utilisation spikes due to compression, decompression, and TLS encryption overhead.

The resource consumption creates a performance paradox: scaling up initially degrades cluster performance as resources are diverted to data movement. Instead of immediately gaining capacity, you temporarily lose capacity whilst the rebalancing completes.

The Performance Paradox

This temporary performance degradation extends beyond resource consumption. The rebalancing process cannot monopolise cluster resources—production traffic must continue flowing. This means rebalancing operations typically run at reduced speed to avoid impacting live workloads, extending the time required for scaling operations.

In practice, moving multiple terabytes whilst maintaining production performance can take hours or even days. This timeline makes reactive scaling impractical for traffic spikes that might last minutes or hours.

Time Complexity

The mathematics are unforgiving. Even with high-performance networking and storage, transferring terabytes of data whilst maintaining production service levels requires significant time. The OSO engineers have seen scaling operations that took 6-12 hours to complete, rendering them useless for reactive scaling scenarios.

This temporal constraint forces teams into uncomfortable trade-offs: either maintain significant over-capacity to handle spikes, or accept that scaling cannot respond to immediate demand changes.

Tiered Storage as the Game Changer

The 10x Improvement

Kafka’s tiered storage feature, which reached general availability in recent versions, fundamentally alters this equation. By separating hot data (recent, frequently accessed) from cold data (older, archival), tiered storage dramatically reduces the amount of data stored locally on each broker.

Consider the same scaling scenario with tiered storage enabled. Instead of 4TB per broker stored locally, you might have 400GB of hot data per broker, with 3.6TB of cold data stored in object storage. When scaling up, you only need to rebalance the 400GB of hot data—a 10x reduction in data movement.

This improvement cascades through every aspect of the scaling operation. Network bandwidth requirements drop by 90%. Disk I/O drops proportionally. The time required for rebalancing drops from hours to minutes. Suddenly, reactive scaling becomes practically feasible.

Object Storage Economics

The choice of object storage for the cold tier creates additional benefits beyond performance. Object storage services like Amazon S3 offer virtually unlimited capacity at lower per-GB costs than block storage. This removes capacity planning constraints that often force over-provisioning of local storage.

The pricing model also aligns better with actual usage patterns. You pay for the storage you use, plus request-based charges for access. For archival data that’s rarely accessed, this typically costs significantly less than maintaining equivalent capacity on local block storage.

However, the economics become more complex for frequently accessed cold data. Applications that repeatedly read archived data might generate substantial request charges. This makes tiered storage most effective when cold data access patterns are genuinely archival rather than regular.

Implementation Considerations

Tiered storage isn’t universally applicable. Some use cases require all data to remain locally accessible with minimal latency. Real-time analytics applications, fraud detection systems, and certain types of stream processing might not tolerate the higher latency of cold data retrieval from object storage.

The OSO engineers recommend careful analysis of data access patterns before enabling tiered storage. Applications that benefit most have clear hot/cold data boundaries—recent data accessed frequently, older data accessed rarely or never.

Kubernetes-Native Auto-scaling Implementation

The Operator Pattern Advantage

Modern Kafka deployments increasingly run on Kubernetes, and the operator pattern provides sophisticated orchestration capabilities beyond simple pod scaling. Projects like Strimzi implement Kubernetes-native Kafka operators that understand the complexities of Kafka cluster management.

These operators implement the scale sub-resource pattern, enabling Kubernetes’ Horizontal Pod Autoscaler to trigger scaling operations whilst the operator handles the Kafka-specific complexities. When HPA decides to scale up, the operator doesn’t simply create new pods—it coordinates broker startup, cluster joining, and data rebalancing in the correct sequence.

This orchestration is crucial because the order of operations matters immensely. New brokers must fully join the cluster before rebalancing begins. Rebalancing must complete before the cluster is considered successfully scaled. Scale-down operations require even more careful coordination to ensure data safety.

Cruise Control Integration

LinkedIn’s Cruise Control project provides the sophisticated rebalancing algorithms necessary for production auto-scaling. Rather than naive round-robin data distribution, Cruise Control considers multiple metrics: CPU utilisation, disk utilisation, network I/O, and partition leader distribution.

Modern Kafka operators integrate directly with Cruise Control, automatically triggering rebalancing operations after scaling events. This integration removes the manual intervention traditionally required for safe Kafka scaling, making automated scaling practically feasible.

The integration also enables more sophisticated rebalancing strategies. Instead of simply achieving equal data distribution, Cruise Control can optimise for specific performance characteristics based on your workload patterns.

Metric Selection Strategy

CPU utilisation provides the simplest auto-scaling metric, but Kafka-specific metrics often provide better scaling decisions. Thread pool utilisation, request queue depth, and partition lag metrics can indicate capacity constraints before CPU becomes a bottleneck.

The OSO engineers have observed that the optimal scaling metric depends heavily on workload characteristics. Producer-heavy workloads might benefit from scaling on network thread utilisation. Consumer-heavy workloads might scale better based on fetch request latency. Storage-intensive workloads might use disk utilisation as the primary metric.

Prometheus-based monitoring stacks can expose these Kafka metrics to Kubernetes’ HPA, enabling sophisticated scaling strategies tailored to specific use cases.

Production Deployment Strategies and Limitations

Planned vs Reactive Scaling

Despite these improvements, Kafka auto-scaling works best for predictable patterns rather than emergency capacity expansion. The OSO engineers have found auto-scaling particularly effective for:

  • Daily traffic patterns (scaling up during business hours, down overnight)
  • Weekly patterns (scaling down for weekends)
  • Seasonal adjustments (gradual scaling for anticipated growth periods)

These scenarios provide sufficient time for scaling operations to complete and stabilise before peak demand arrives. The automation removes the operational burden of manual scaling whilst maintaining the planning-based approach that works well with Kafka’s architecture.

For unpredictable traffic spikes—viral content, unexpected marketing success, or breaking news events—manual scaling ahead of anticipated demand remains the most reliable approach. Auto-scaling cannot react quickly enough to provide immediate capacity for sudden spikes.

Infrastructure Capacity Planning

Kubernetes environments add another layer of complexity. When HPA triggers Kafka cluster scaling, new broker pods require compute resources. If the Kubernetes cluster lacks available capacity, the pods remain pending whilst cluster auto-scaling provisions new nodes.

Cloud provider node provisioning typically takes 2-5 minutes, plus additional time for node joining, pod scheduling, and broker startup. This delay compounds the time required for Kafka scaling operations, making reactive scaling even less feasible for immediate spikes.

The OSO engineers recommend maintaining some buffer capacity in Kubernetes clusters to ensure immediate pod scheduling for auto-scaling operations. This trades off some resource efficiency for improved scaling responsiveness.

Stabilisation Windows

Auto-scaling configurations must prevent oscillation—scaling up and down repeatedly due to temporary metric fluctuations. Kafka’s scaling characteristics make this particularly important because scaling operations themselves affect the metrics used for scaling decisions.

During scale-up rebalancing, CPU and network utilisation spike as data moves between brokers. Without appropriate stabilisation windows, this temporary spike could trigger additional scale-up operations, creating unnecessary resource consumption and potential instability.

The OSO engineers typically configure stabilisation windows of 10-15 minutes for Kafka auto-scaling, allowing metrics to settle after scaling operations complete. This prevents scaling loops whilst maintaining reasonable responsiveness to sustained load changes.

Implementation Guidelines for Kafka Auto-scaling

Based on extensive production experience, the OSO engineers recommend this implementation approach:

Start with tiered storage assessment. Analyse your data access patterns to understand hot vs cold data ratios. Applications with clear temporal access boundaries benefit most from tiered storage. Measure the potential reduction in local data storage to estimate scaling performance improvements.

Configure conservative HPA settings. Begin with longer stabilisation windows (15+ minutes) and gradually reduce based on observed behaviour. Scale-up policies can be more aggressive than scale-down policies since adding capacity is safer than removing it.

Use auto-scaling for predictable patterns. Implement auto-scaling for daily, weekly, and seasonal traffic patterns where scaling operations can complete before peak demand. Continue using manual scaling for unpredictable spikes and critical events.

Monitor data movement costs. Track network utilisation, storage I/O, and time required for rebalancing operations. These metrics help optimise rebalancing configuration and identify when tiered storage configuration needs adjustment.

Plan manual interventions for high-impact events. Marketing campaigns, product launches, and other planned events should trigger manual scaling ahead of anticipated demand. Auto-scaling provides operational efficiency, not emergency capacity.

The Future of Elastic Kafka

The combination of tiered storage and Kubernetes-native operators represents a significant step toward truly elastic Kafka infrastructure. However, we’re still in the early stages of this evolution. Emerging proposals for fully disaggregated storage architectures promise even greater elasticity by removing local storage entirely.

These future architectures would separate compute and storage completely, enabling true stateless broker scaling. Scaling operations would require no data movement because all data would reside in shared storage accessible by any broker. Such architectures would make Kafka as elastic as stateless web applications.

For current implementations, the OSO engineers see auto-scaling as a powerful operational efficiency tool rather than a complete solution to capacity planning challenges. Organisations can achieve significant cost savings and reduced operational overhead through thoughtful implementation of auto-scaling for predictable workload patterns.

The key insight from our extensive production experience: Kafka auto-scaling isn’t about eliminating capacity planning—it’s about automating the routine scaling decisions so engineers can focus on the exceptional cases that require human judgment. When implemented thoughtfully with tiered storage and appropriate stabilisation controls, auto-scaling transforms Kafka from a statically-provisioned infrastructure component into a dynamically-responsive platform that adapts to workload demands whilst maintaining the reliability and performance characteristics that make Kafka indispensable for modern data architectures.

The elasticity barrier isn’t completely broken, but it’s certainly been significantly lowered. For teams running Kafka on Kubernetes with predictable traffic patterns, auto-scaling now offers a practical path to improved resource efficiency without sacrificing the operational reliability that production systems demand.

Geting started with Tiered Storage in Apache Kafka

Have a conversation with one of our experts to discover how we can help configure your cluster to use this new storage.

CONTACT US
OSO
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.