blog by OSO

Apache Kafka Common Mistakes and Bad Practices

Sion Smith 13 July 2023

At OSO we have configured, deployed and administered 100’s of Kafka clusters, we’ll outline some of the Apache Kafka common mistakes and misconceptions we see, when using Kafka and how to avoid them. We will also touch upon the topic of when not to use Kafka and explore the different use cases for Kafka, Kafka Streams, and KSQL.

Apache Kafka Common Mistakes: Delaying messages in Real-Time streaming

One Apache Kafka common mistake that people make is trying to delay messages in real-time streaming. This is not recommended because it can lead to issues and impact the overall performance of your Kafka cluster. Instead of delaying messages, it is better to use completion criteria to determine when a message should be processed.

Apache Kafka Common Mistakes: Using Kafka as a Key-Value store

Another mistake that is often made is using Kafka as a key-value store. Kafka is not designed to be a key-value store, and using it as one can lead to unexpected behaviour. When using a compacted topic in Kafka, it is important to understand that compaction happens eventually, not immediately. This means that you may experience duplicates and should ensure that your consumer is able to handle them appropriately.

Understanding Use Cases for Kafka, Databases, and Analytics

It is crucial to understand the different use cases for Kafka, databases, and analytics. While Kafka is great for replaying historical data and ensuring guaranteed order, it may not be the best choice for complex queries and analytics. In such cases, using a database like Oracle or MongoDB, or a tool like Snowflake for reporting, would be more suitable.

Additionally, it is important to recognize when not to use Kafka. For example, in the case of building a connected car infrastructure, the last mile integration may not be possible with Kafka. In such scenarios, use something like MQTT – it is essential to understand the limitations of the technology and consult with Kafka experts to ensure success.

Differentiating Queues and Logs

One of the Apache Kafka common mistakes could be not identifying the difference between queues and logs. There is often confusion between queues and logs when it comes to Kafka. While Kafka can be used for integrating different systems, it is important to note that messages in Kafka are stored in an immutable log. This allows for easy integration and building new use cases without disrupting existing ones. In contrast, traditional queue-based systems may require starting a new integration trajectory for each system, leading to additional complexity.

The roadmap for Kafka, Flink, and KSQL

The roadmap for Kafka, Flink, and KSQL is focused on their respective sweet spots. Kafka Streams is perfect for building microservices and will continue to be a valuable tool. Flink, on the other hand, is ideal for big data analytics and offers capabilities for both batch and real-time processing. KSQL, positioned in the middle, is great for streaming ETL and provides easy integration with Kafka. It is important to understand the strengths and limitations of each tool to make the right choice for your use case.

Managing high throughput and low latency requirements

Managing high throughput and low latency requirements can be challenging, especially in the financial industry. It is important to understand your data and use cases to determine the best approach for managing these requirements. Partitioning strategy and key selection play a crucial role in prioritising important information for processing. Additionally, leveraging tiered storage and implementing strong multi-tenancy controls can help optimise performance and ensure efficient resource allocation.

Challenging cloud providers for performance optimisation

When it comes to high throughput and low latency requirements, it is recommended to challenge cloud providers to solve these problems. With managed Kafka services like Confluent Cloud, the cloud provider takes responsibility for optimising performance and addressing specific use case needs. While there may be limitations for extreme cases, leveraging the expertise of the cloud provider can greatly simplify the management of Kafka clusters and ensure reliable performance.

Considerations for cloud adoption in the financial industry

Cloud adoption in the financial industry can be a contentious topic- something we at OSO have a lot of experience in. However, it is important to recognize that the landscape is evolving, and many financial institutions are already embracing cloud technologies for core banking and reporting purposes. Security concerns have been addressed, and there are success stories of financial institutions leveraging cloud solutions like Confluent Cloud alongside their existing on-premise infrastructure. It is crucial to evaluate the specific requirements and consult with experts to make informed decisions.

Implementing Apache Kafka best practices: Say goodbye to Apache Kafka Common Mistakes

Even if you choose not to go with a managed solution, there are still valuable lessons to be learned from the architecture and controls implemented in managed Kafka services. Understanding the client-side controls and architecture decisions can help optimise performance and ensure efficient resource utilisation. There is a wealth of content available on this topic, providing insights and best practices for managing Kafka clusters effectively.

If you are wanting to offer Kafka as a service as a multi-tenanted platform, check our whitepaper as it can be challenging due to various use cases with high throughput and low latency requirements. The whitepaper takes you through data and use cases to determine the best approach for managing these requirements. This includes defining a partitioning strategy and selecting appropriate keys to prioritise important information for processing. 

Contact us to learn more about avoiding Apache Kafka common mistakes and implementing best practices.

For more content:

How to take your Kafka projects to the next level with a Confluent preferred partner

Event driven Architecture: A Simple Guide

Watch Our Kafka Summit Talk: Offering Kafka as a Service in Your Organisation

Successfully Reduce AWS Costs: 4 Powerful Ways

Protecting Kafka Cluster

Get started with OSO professional services for Apache Kafka

Have a conversation with a Kafka expert to discover how we help your adopt of Apache Kafka in your business.

Contact Us