blog by OSO

Kafka Streams Memory Best Practices

Sion Smith 18 July 2023

Joining data streams is complex, there are many techniques and best practices for joining these data Kafka streams memory, we will explore how to run Kafka Streams on Kubernetes and how to diagnose and fix a common problem with memory usage.

When running real-time processing applications, it’s crucial to avoid unnecessary delays caused by application restarts. However, if your Kafka Streams application exceeds its resource limits, it can get killed, leading to restarts and delays. In this article, I will discuss how to diagnose and fix this issue in Kafka Streams memory applications.

Kafka Streams Memory: Diagnosing the Kafka Streams problems

To determine if your application was killed due to memory issues, you can use the kubectl describe command. Look for the “reason” field in the “last state” section. If it says “out of memory killed,” then this article will help you address the problem.

Kafka Streams Memory: Understanding pod memory usage in Kafka Streams

A common problem in Kafka streams memory is thinking the issue is a memory leak. You should introduce resource limits and analyse memory usage in Grafana. What you might discover is that the problem is not with the heap memory. Instead, it could be related to how Kafka Streams uses memory internally.

Kafka Streams uses a cache to store interfaces and classes on the Java heap. However, the actual memory usage is controlled by an internal C++ implementation called RocksDB. This implementation has in-memory data structures that consume a significant amount of memory.

Fixing Kafka Streams RocksDB memory usage issue

To address the memory usage issue in Kafka Streams, we can set a bounded memory limit for RocksDB. Kafka Streams provides a convenient way to configure this limit using the rocksdb.config.setter parameter.

Here’s an example of how to set the memory limit for RocksDB:

Properties props = new Properties(); props.put(StreamsConfig.ROCKSDB_CONFIG_SETTER_CLASS_CONFIG, BoundedMemoryRocksDBConfig.class.getName());

In the BoundedMemoryRocksDBConfig class, you can define the memory limit for RocksDB. The exact value depends on factors like the number of state stores and partitions in your application. It’s important to read the article mentioned earlier to understand how to compute the appropriate memory limit for your specific application.

How to calculate your Kafka streams memory limit

To compute the memory limit for RocksDB, you can use a formula based on various factors. Here’s an example of how to compute the memory limit:

  • Get the total memory available to your container using the kubectl command.
  • Determine the percentage of memory that Linux processes need to operate. This value can be set based on experimentation.
  • Subtract the memory used by the JVM heap, which includes the code cache, classes, and threads.
  • Subtract the heap usage of your application. If you suspect a memory leak, you can use profiling tools to analyse the heap usage.
  • The remaining memory can be allocated to RocksDB.

Remember to adjust these values based on your application’s specific needs and performance requirements. It’s important to test and profile your application to determine the optimal memory usage.

Monitoring memory usage

After implementing the fixes mentioned above, you should monitor the memory usage of your Kafka Streams application. By using tools like Grafana, you can visualise the memory consumption of RocksDB and other components. A sample Grafana dashboard can be found here, which you can use as a starting point for monitoring your own application.

Kafka Streams Memory: Takeaways

Running a large number of Kafka Streams applications can be operational complex, I discussed how to diagnose and fix memory usage issues in Kafka Streams applications running on Kubernetes. By setting a bounded memory limit for RocksDB and optimising other memory components, you can ensure that your application runs smoothly without unnecessary restarts. Remember to test and profile your application to determine the appropriate memory limits for your specific use case.

Fore more content:

How to take your Kafka projects to the next level with a Confluent preferred partner

Event driven Architecture: A Simple Guide

Watch Our Kafka Summit Talk: Offering Kafka as a Service in Your Organisation

Successfully Reduce AWS Costs: 4 Powerful Ways

Protecting Kafka Cluster

Apache Kafka Common Mistakes

Kafka Cruise Control 101

Kafka performance best practices for monitoring and alerting

Real-time Push APIs Using Kafka 

The new consumer rebalance protocol KIP-848

Get started with OSO professional services for Apache Kafka

Have a conversation with a Kafka expert to discover how we help your adopt of Apache Kafka in your business.

Contact Us