Multi-message types in Kafka topics: Lessons from a real-world pipeline

When your Kafka pipeline must handle dozens of message types from diverse systems, the conventional “one message type per topic” approach quickly breaks down. This is especially true in complex domains like travel, where data comes in various formats and structures, from room reservations to spa bookings and flight bookings.

At OSO, we recently worked with a travel company serving tens of thousands of hotels globally, we faced exactly this challenge. What makes our use case particularly challenging is the diversity of our data sources. We had to connect to hotel systems ranging from modern cloud-based APIs to legacy DOS-based platforms, with everything in between. These systems send data through various methods: webhooks, APIs, FTP transfers, and even email. Some data arrives in real-time, while other data comes in batches.

In this post, I’ll share how OSO solved one of our biggest architectural challenges: creating a schema approach that could handle multiple message types while maintaining order, simplifying development, and ensuring consistent processing.

The Limitations of Traditional Topic Design

The conventional wisdom in Kafka suggests that each topic should contain just one message type. This approach functions as a form of “strong typing” for Kafka topics, which can prevent errors and simplify processing logic. Initially, we followed this guidance: room reservation messages went into a room reservation topic, restaurant reservation messages into a restaurant reservation topic, and so on.

However, as our schema expanded and grew more complex, we encountered several significant challenges:

Cross-Topic Ordering Problems

When related messages exist in different topics, enforcing message ordering becomes difficult. For example, if a spa reservation (in the spa reservation topic) can only be processed after a specific room reservation (in the room reservation topic), coordinating this dependency becomes unnecessarily complex.

Variations Within Message Types

Even within a single conceptual message type, variations exist. A room reservation update event might contain dozens of fields (reservation ID, account information, guest name, check-in/out dates), while a room reservation deletion event might contain only the reservation ID. Both are technically “room reservation events,” but they have different structures and processing needs.

Nested Message Structures

In hospitality data, complex relationships exist between entities. Room reservations and golf reservations can both contain guest profiles. These profiles are valid messages in their own right and might need separate processing, yet they’re nested within other message types. With the one-message-type-per-topic approach, should these embedded profiles be extracted and placed in a profile topic? This quickly creates confusion and complexity.

Service Configuration Overhead

Services often need to consume multiple message types. When adding a new message type in a traditional setup, you’d need to:

Create a new topic for the message type
Update your service to consume from this new topic
Implement new serialisers/deserialisers
Add logic to handle the new message type

This overhead becomes significant when your domain contains dozens of related message types, especially when many services could process new message types without code changes if only they were receiving them.

Designing a Unified Schema Approach

To address these challenges, we designed a generalised schema that could be used across all topics. We implemented this using Protocol Buffers (protobuf), which provided strong typing while allowing for flexibility.

The Hotel Event Pattern

At the core of our solution is the concept of a “Hotel Event” – a parent message type that can contain any of our specialised message types. Within protobuf, we used the “oneof” field type to implement this pattern:

message HotelEvent {
  oneof event_type {
    RoomReservation room_reservation = 1;
    RestaurantReservation restaurant_reservation = 2;
    GolfReservation golf_reservation = 3;
    Profile profile = 4;
    // ... other event types
  }
}

message RoomReservation {
  oneof event_action {
    RoomReservationUpdated updated = 1;
    RoomReservationDeleted deleted = 2;
    // ... other actions
  }
}

// Define other message types similarly

This approach allows us to maintain strong typing (each message is a specific type with a defined structure) while unifying all messages under a common parent type. Every message on any topic is a HotelEvent, regardless of its specific type.

Advantages of the Unified Schema

This approach offered several immediate benefits:

Simplified Topic Structure: Since every message is a HotelEvent, we could be more flexible with our topic organization, grouping messages by function or processing requirements rather than rigidly by type.
Preserved Ordering Across Types: When message ordering matters across different types (like ensuring a deletion event processes after its creation event), we can place these messages in the same topic and partition, guaranteeing ordered processing.
Consistent Processing: Components within messages (like profiles embedded in reservations) are processed consistently regardless of which container message they arrive in, because they’re always the same structured type.

Simplified Service Development: Services can process any message by checking its type at runtime, rather than requiring separate connection logic for each topic and message type.

Implementation Examples

While the unified schema approach solved many problems, it introduced some implementation challenges we had to address.

Handling Switch Statement Complexity

When using a unified schema with the “oneof” pattern, your processing code will contain switch statements to handle different message types:

public void process(HotelEvent event) {
  switch (event.getEventTypeCase()) {
    case ROOM_RESERVATION:
      RoomReservation reservation = event.getRoomReservation();
      switch (reservation.getEventActionCase()) {
        case UPDATED:
          handleRoomReservationUpdate(reservation.getUpdated());
          break;
        case DELETED:
          handleRoomReservationDelete(reservation.getDeleted());
          break;
        // ... other cases
      }
      break;
    case RESTAURANT_RESERVATION:
      // Similar nested switch
      break;
    // ... other event types
  }
}

This can lead to deep nesting and verbose code. We mitigated this by:

Using the visitor pattern where appropriate
Creating utility functions to handle common processing paths
Breaking down large switch statements into smaller, focused methods

Serialisation/Deserialisation Strategies

With a unified schema, serialisation and deserialisation become more consistent. Every service can use the same serialiser/deserialiser for HotelEvent, rather than needing specialised ones for each message type. This significantly reduced boilerplate code.

Schema Evolution Management

We used the buf CLI tool for protobuf linting, formatting, and detecting breaking changes. This was crucial for maintaining schema compatibility as our system evolved. The tool helped us:

Ensure consistent naming conventions
Detect accidental breaking changes before deployment
Auto-format protobuf files for readability
Generate code consistently across services

Real-world Results and Lessons Learned

After implementing the unified schema approach across our Kafka ecosystem, we observed several positive outcomes:

Development Velocity Improvements

Adding new message types became significantly easier. Instead of creating new topics and updating multiple services, we simply added new types to our protobuf schema and deployed the updated schema package. Services that needed to process the new types could update their logic, while others continued to function without changes.

Onboarding Experience

New developers found the system more intuitive. With a consistent message structure across the platform, they could understand the data flow more quickly and focus on business logic rather than the intricacies of Kafka topic configuration.

Performance Considerations

We did observe minor overheads from using a more complex schema structure, primarily in serialisation/deserialisation time and message size. However, these were minimal compared to the architectural benefits gained.

When This Approach Might Not Be Right

The unified schema approach isn’t suitable for every use case:

If your message types are fundamentally different with no conceptual relationship
If you have strong security requirements that necessitate physical separation of message types
If different message types have vastly different retention or processing requirements
If your organisation has strict team boundaries where different teams own different message types

Practical Takeaways for Your Kafka Architecture

If you’re considering implementing a unified schema approach, here are some practical steps to get started:

Questions to Ask Before Implementation

Do your message types share conceptual relationships?
Do you need to maintain order across different message types?
Are you currently managing a complex web of topics that’s becoming difficult to maintain?
Would your services benefit from more flexible consumption patterns?

Transitioning from Separate Topics

Start with a schema audit: Document all your current message types and their relationships
Design your unified schema: Create a common parent type with specialized subtypes
Build a proof-of-concept: Implement the new schema in a limited scope
Develop a migration strategy: Consider running dual systems temporarily
Roll out incrementally: Convert one message flow at a time

Configuration Recommendations

Increase the default message size limit if your unified messages are larger
Adjust consumer configurations to account for potentially more complex processing
Configure serialisers to handle the unified schema efficiently
Consider compression if message size becomes an issue

Managing Schema Evolution

Establish clear guidelines for adding new message types
Use tooling to detect breaking changes
Create a versioning strategy for your unified schema
Consider backward and forward compatibility requirements

Beyond One-Topic-Per-Type

The journey from a traditional one-topic-per-type Kafka architecture to a unified schema approach taught us that conventional wisdom doesn’t always apply to complex, real-world systems. By challenging the standard pattern and designing a solution tailored to our domain, we created a more flexible, maintainable, and developer-friendly system.

The key insight is that your Kafka architecture should reflect the natural structure of your domain. In hospitality, where data entities are richly interconnected and have complex relationships, a unified schema approach allowed us to model these relationships more accurately and process them more consistently.

As you evaluate your own Kafka architecture, consider whether your current topic structure is serving your domain needs or creating unnecessary complexity. Sometimes, the simplest solution is to embrace the complexity of your domain within your schema, rather than trying to flatten it into disconnected topics.

By thinking beyond the one-topic-per-type pattern, you can create event-driven architectures that are both powerful and manageable, even as they scale to handle complex, real-world domains.

Multi-message types in Kafka topics: Lessons from a real-world pipeline

The Limitations of Traditional Topic Design

Cross-Topic Ordering Problems

Variations Within Message Types

Nested Message Structures

Service Configuration Overhead

Designing a Unified Schema Approach

The Hotel Event Pattern

Advantages of the Unified Schema

Implementation Examples

Handling Switch Statement Complexity

Serialisation/Deserialisation Strategies

Schema Evolution Management

Real-world Results and Lessons Learned

Development Velocity Improvements

Onboarding Experience

Performance Considerations

When This Approach Might Not Be Right

Practical Takeaways for Your Kafka Architecture

Questions to Ask Before Implementation

Transitioning from Separate Topics

Configuration Recommendations

Managing Schema Evolution

Beyond One-Topic-Per-Type

Get started with Apache Kafka today

Latest blog posts

How to Cut Your Snowflake Bill by Up to 55% with a single Open Source CLI tool

An Automated Tool for End-to-End MSK ZooKeeper-to-KRaft Migration

Multi-message types in Kafka topics: Lessons from a real-world pipeline

The Limitations of Traditional Topic Design

Cross-Topic Ordering Problems

Variations Within Message Types

Nested Message Structures

Service Configuration Overhead

Designing a Unified Schema Approach

The Hotel Event Pattern

Advantages of the Unified Schema

Implementation Examples

Handling Switch Statement Complexity

Serialisation/Deserialisation Strategies

Schema Evolution Management

Real-world Results and Lessons Learned

Development Velocity Improvements

Onboarding Experience

Performance Considerations

When This Approach Might Not Be Right

Practical Takeaways for Your Kafka Architecture

Questions to Ask Before Implementation

Transitioning from Separate Topics

Configuration Recommendations

Managing Schema Evolution

Beyond One-Topic-Per-Type

Get started with Apache Kafka today

Latest blog posts

How to Cut Your Snowflake Bill by Up to 55% with a single Open Source CLI tool

An Automated Tool for End-to-End MSK ZooKeeper-to-KRaft Migration

Subscription form (footer)