Building Kafka Streams Statestores? We have built our fair share of Kafka Streams applications, and there are a number of challenges and opportunities of building your own state source. Using Nitrite Database as an feasible alternative to internal state stores and has the benefits of being easily integrated with existing systems together with a good developer understanding. Lets dig into some actionable design decisions!
Using Nitrite Database as a Kafka Streams Statestores
Managing Kafka Streams statestores in a distributed platform like Kubernetes is extremely complex, however many companies demand this level of high availability from multiple running pods. To achieve this there has been a paradigm shift to move data into embedded databases instead of using the internal state stores. When restarting or deploying a new pod, using the standard KeyValueStore would result in a full scan of the statestores, filtering out the records that do not apply to the search criteria. For small statestores, this is not an issue. For larger ones it becomes quite a performance bottleneck.
The benefits of using something like H2, Lucene or NitriteDB backed by PVC (persistent Volume Claims) are the ability to integrate with existing underlying technologies for statestores. By choosing a document store as the base technology, we are able to push the querying down to the underlying technology instead of having to deal with that ourselves. You still have the flexibility to easily reset and restore from the changelog if any issues arise. This shared state store also helps running multiple instances of the same Kafka Streams app, something you cannot do when in a single pod. However, do not make the mistake of writing directly from applications to data sources, as it can lead to difficulties in maintaining and guaranteeing data integrity.
The important factor here is to offload this processing to a robust technology, leveraging existing infrastructure patterns that can handle a large volume of transactions per second.
We open sourced an example project for anyone wishing to adopt this approach, please reach out to use for more information.
Fore more content:
How to take your Kafka projects to the next level with a Confluent preferred partner
Event driven Architecture: A Simple Guide
Watch Our Kafka Summit Talk: Offering Kafka as a Service in Your Organisation
Successfully Reduce AWS Costs: 4 Powerful Ways
Protecting Kafka Cluster
Apache Kafka Common Mistakes
Kafka Cruise Control 101
Kafka performance best practices for monitoring and alerting