blog by OSO

Understanding Apache Kafka: When NOT to Use Apache Kafka?

Sion Smith 4 July 2023

Apache Kafka is a powerful tool that is widely used for data streaming and has become the de facto standard in the industry. It offers a range of capabilities, including message queuing, distributed storage, data processing, and stream processing. However, despite its versatility, there are certain scenarios where Kafka may not be the best choice. Understanding Apache Kafka, especially when to use or when not to use Apache Kafka. In this article, we will explore some of these cases and discuss when it is not advisable to use Apache Kafka.

Understanding Apache Kafka

Before diving into the limitations of Kafka, let’s briefly recap what it is and what it can do. Kafka is a large-scale message queue that can process millions of messages per second for both transactional and analytical workloads. It serves as a distributed storage system and automatically handles backpressure, allowing you to handle slow consumers and replay data. Kafka also enables you to decouple systems and provides connectivity with Kafka Connect, allowing you to connect to any legacy system or cloud-native platform without the need for additional tools. Additionally, Kafka supports stream processing or streaming analytics, allowing you to continuously process data in motion at any scale reliably. In the cloud, Kafka is fully managed, including integration and processing, so you can focus on your business without worrying about the underlying infrastructure. Furthermore, it is common to have multiple Kafka clusters for disaster recovery, aggregations, migrations, and other scenarios.

Kafka’s Versatility and Use Cases

With its wide range of capabilities, Kafka is used in various industries and for many different scenarios. It is suitable for transactional low-latency workloads as well as handling really big data workloads. This broad spectrum of use cases makes Kafka a valuable tool that provides a lot of business value.

When Not to Use Apache Kafka

While Kafka is a powerful tool, it is not suitable for every use case. Here are some scenarios where it is not advisable to use Apache Kafka:

  1. Kafka is not a replacement for complex analytics databases: While Kafka has durable and persistent features, interactive queries, and exactly-once semantics, it is not designed to replace databases specifically built for complex analytics use cases. Systems like Oracle databases, time series databases, document databases, and MapReduce storage systems are better suited for these scenarios. However, Kafka can be used in conjunction with these databases as a data integration layer in real-time.
  1. Kafka is not suitable as a proxy for millions of clients: While Kafka can connect to millions of clients, it is not recommended to use it as a proxy for such large numbers. In scenarios like video gaming companies, connected car infrastructures, and mobility services, where there are millions of clients, it is more appropriate to use other tools like MQ DT or REST proxies as a proxy layer.
  1. Kafka is not an API management platform: While Kafka is a powerful streaming platform, it is not designed to be an API management platform. There are other tools like MuleSoft, IBM, API Connect, and more that specialize in API management. However, Kafka can work together with these tools and be used as a streaming layer in conjunction with an API management gateway.
  1. Kafka is not the right tool for processing large messages: In most cases, Kafka is not suitable for processing large files. The recommended approach is to use the claim check enterprise integration pattern, where Kafka is used for data orchestration while the actual file processing is done outside of Kafka. However, there are some specific use cases where Kafka can be used for processing large files, such as splitting large legacy files or uploading large files into data lakes or data warehouses.
  1. Kafka is not an IoT platform: While Kafka is commonly used in IoT projects, it is not an IoT platform itself. It is best used in combination with other IoT platforms to connect to various systems, process data, and integrate with CRMs, data lakes, and analytics platforms. Depending on the specific IoT project, Kafka Connect, HTTP, or existing OT middleware may be more suitable for integration with IoT technologies.
  1. Kafka is not designed for hard real-time or deterministic systems: Kafka is not suitable for hard real-time or deterministic systems, such as safety-critical systems or self-driving cars. For these types of systems, technologies like C or Rust are more appropriate. However, Kafka can be used for data integration with low latency and critical use cases within the enterprise.

OSO Summary of when NOT to use Kafka

Apache Kafka is a powerful tool that is widely used for data streaming and offers a range of capabilities. It is the de facto standard for data streaming, with over 100,000 organizations using it today. However, it is important to understand when not to use Kafka and when to combine it with other technologies. While Kafka is great for many things, there are certain scenarios where other tools or technologies may be more suitable. By understanding the limitations of Kafka and its complementary nature with other tools, you can make informed decisions and leverage its capabilities effectively.

If you have any feedback or questions, feel free to reach out to the Kafka Experts at OSO.

Fore more content:

How to take your Kafka projects to the next level with a Confluent preferred partner

Event driven Architecture: A Simple Guide

Watch Our Kafka Summit Talk: Offering Kafka as a Service in Your Organisation

Successfully Reduce AWS Costs: 4 Powerful Ways

Protecting Kafka Cluster

Get started with OSO professional services for Apache Kafka

Have a conversation with a Kafka expert to discover how we help your adopt of Apache Kafka in your business.

Contact Us