When your Kafka pipeline must handle dozens of message types from diverse systems, the conventional “one message type per topic” approach quickly breaks down. This is especially true in complex domains like travel, where data comes in various formats and structures, from room reservations to spa bookings and flight bookings.
At OSO, we recently worked with a travel company serving tens of thousands of hotels globally, we faced exactly this challenge. What makes our use case particularly challenging is the diversity of our data sources. We had to connect to hotel systems ranging from modern cloud-based APIs to legacy DOS-based platforms, with everything in between. These systems send data through various methods: webhooks, APIs, FTP transfers, and even email. Some data arrives in real-time, while other data comes in batches.
In this post, I’ll share how OSO solved one of our biggest architectural challenges: creating a schema approach that could handle multiple message types while maintaining order, simplifying development, and ensuring consistent processing.
The Limitations of Traditional Topic Design
The conventional wisdom in Kafka suggests that each topic should contain just one message type. This approach functions as a form of “strong typing” for Kafka topics, which can prevent errors and simplify processing logic. Initially, we followed this guidance: room reservation messages went into a room reservation topic, restaurant reservation messages into a restaurant reservation topic, and so on.
However, as our schema expanded and grew more complex, we encountered several significant challenges:
Cross-Topic Ordering Problems
When related messages exist in different topics, enforcing message ordering becomes difficult. For example, if a spa reservation (in the spa reservation topic) can only be processed after a specific room reservation (in the room reservation topic), coordinating this dependency becomes unnecessarily complex.
Variations Within Message Types
Even within a single conceptual message type, variations exist. A room reservation update event might contain dozens of fields (reservation ID, account information, guest name, check-in/out dates), while a room reservation deletion event might contain only the reservation ID. Both are technically “room reservation events,” but they have different structures and processing needs.
Nested Message Structures
In hospitality data, complex relationships exist between entities. Room reservations and golf reservations can both contain guest profiles. These profiles are valid messages in their own right and might need separate processing, yet they’re nested within other message types. With the one-message-type-per-topic approach, should these embedded profiles be extracted and placed in a profile topic? This quickly creates confusion and complexity.
Service Configuration Overhead
Services often need to consume multiple message types. When adding a new message type in a traditional setup, you’d need to:
- Create a new topic for the message type
- Update your service to consume from this new topic
- Implement new serialisers/deserialisers
- Add logic to handle the new message type
This overhead becomes significant when your domain contains dozens of related message types, especially when many services could process new message types without code changes if only they were receiving them.
Designing a Unified Schema Approach
To address these challenges, we designed a generalised schema that could be used across all topics. We implemented this using Protocol Buffers (protobuf), which provided strong typing while allowing for flexibility.
The Hotel Event Pattern
At the core of our solution is the concept of a “Hotel Event” – a parent message type that can contain any of our specialised message types. Within protobuf, we used the “oneof” field type to implement this pattern:
message HotelEvent {
oneof event_type {
RoomReservation room_reservation = 1;
RestaurantReservation restaurant_reservation = 2;
GolfReservation golf_reservation = 3;
Profile profile = 4;
// ... other event types
}
}
message RoomReservation {
oneof event_action {
RoomReservationUpdated updated = 1;
RoomReservationDeleted deleted = 2;
// ... other actions
}
}
// Define other message types similarly
This approach allows us to maintain strong typing (each message is a specific type with a defined structure) while unifying all messages under a common parent type. Every message on any topic is a HotelEvent, regardless of its specific type.
Advantages of the Unified Schema
This approach offered several immediate benefits:
- Simplified Topic Structure: Since every message is a HotelEvent, we could be more flexible with our topic organization, grouping messages by function or processing requirements rather than rigidly by type.
- Preserved Ordering Across Types: When message ordering matters across different types (like ensuring a deletion event processes after its creation event), we can place these messages in the same topic and partition, guaranteeing ordered processing.
- Consistent Processing: Components within messages (like profiles embedded in reservations) are processed consistently regardless of which container message they arrive in, because they’re always the same structured type.
Simplified Service Development: Services can process any message by checking its type at runtime, rather than requiring separate connection logic for each topic and message type.
Implementation Examples
While the unified schema approach solved many problems, it introduced some implementation challenges we had to address.
Handling Switch Statement Complexity
When using a unified schema with the “oneof” pattern, your processing code will contain switch statements to handle different message types:
public void process(HotelEvent event) {
switch (event.getEventTypeCase()) {
case ROOM_RESERVATION:
RoomReservation reservation = event.getRoomReservation();
switch (reservation.getEventActionCase()) {
case UPDATED:
handleRoomReservationUpdate(reservation.getUpdated());
break;
case DELETED:
handleRoomReservationDelete(reservation.getDeleted());
break;
// ... other cases
}
break;
case RESTAURANT_RESERVATION:
// Similar nested switch
break;
// ... other event types
}
}
This can lead to deep nesting and verbose code. We mitigated this by:
- Using the visitor pattern where appropriate
- Creating utility functions to handle common processing paths
- Breaking down large switch statements into smaller, focused methods
Serialisation/Deserialisation Strategies
With a unified schema, serialisation and deserialisation become more consistent. Every service can use the same serialiser/deserialiser for HotelEvent, rather than needing specialised ones for each message type. This significantly reduced boilerplate code.
Schema Evolution Management
We used the buf
CLI tool for protobuf linting, formatting, and detecting breaking changes. This was crucial for maintaining schema compatibility as our system evolved. The tool helped us:
- Ensure consistent naming conventions
- Detect accidental breaking changes before deployment
- Auto-format protobuf files for readability
- Generate code consistently across services
Real-world Results and Lessons Learned
After implementing the unified schema approach across our Kafka ecosystem, we observed several positive outcomes:
Development Velocity Improvements
Adding new message types became significantly easier. Instead of creating new topics and updating multiple services, we simply added new types to our protobuf schema and deployed the updated schema package. Services that needed to process the new types could update their logic, while others continued to function without changes.
Onboarding Experience
New developers found the system more intuitive. With a consistent message structure across the platform, they could understand the data flow more quickly and focus on business logic rather than the intricacies of Kafka topic configuration.
Performance Considerations
We did observe minor overheads from using a more complex schema structure, primarily in serialisation/deserialisation time and message size. However, these were minimal compared to the architectural benefits gained.
When This Approach Might Not Be Right
The unified schema approach isn’t suitable for every use case:
- If your message types are fundamentally different with no conceptual relationship
- If you have strong security requirements that necessitate physical separation of message types
- If different message types have vastly different retention or processing requirements
- If your organisation has strict team boundaries where different teams own different message types
Practical Takeaways for Your Kafka Architecture
If you’re considering implementing a unified schema approach, here are some practical steps to get started:
Questions to Ask Before Implementation
- Do your message types share conceptual relationships?
- Do you need to maintain order across different message types?
- Are you currently managing a complex web of topics that’s becoming difficult to maintain?
- Would your services benefit from more flexible consumption patterns?
Transitioning from Separate Topics
- Start with a schema audit: Document all your current message types and their relationships
- Design your unified schema: Create a common parent type with specialized subtypes
- Build a proof-of-concept: Implement the new schema in a limited scope
- Develop a migration strategy: Consider running dual systems temporarily
- Roll out incrementally: Convert one message flow at a time
Configuration Recommendations
- Increase the default message size limit if your unified messages are larger
- Adjust consumer configurations to account for potentially more complex processing
- Configure serialisers to handle the unified schema efficiently
- Consider compression if message size becomes an issue
Managing Schema Evolution
- Establish clear guidelines for adding new message types
- Use tooling to detect breaking changes
- Create a versioning strategy for your unified schema
- Consider backward and forward compatibility requirements
Beyond One-Topic-Per-Type
The journey from a traditional one-topic-per-type Kafka architecture to a unified schema approach taught us that conventional wisdom doesn’t always apply to complex, real-world systems. By challenging the standard pattern and designing a solution tailored to our domain, we created a more flexible, maintainable, and developer-friendly system.
The key insight is that your Kafka architecture should reflect the natural structure of your domain. In hospitality, where data entities are richly interconnected and have complex relationships, a unified schema approach allowed us to model these relationships more accurately and process them more consistently.
As you evaluate your own Kafka architecture, consider whether your current topic structure is serving your domain needs or creating unnecessary complexity. Sometimes, the simplest solution is to embrace the complexity of your domain within your schema, rather than trying to flatten it into disconnected topics.
By thinking beyond the one-topic-per-type pattern, you can create event-driven architectures that are both powerful and manageable, even as they scale to handle complex, real-world domains.