Kafka Protocol Internals: How Partition Remapping Proxies Transform Messages Without Breaking Clients
Sion Smith19 December 2025
Blogs8 mins read
Most Apache Kafka engineers interact with producers and consumers daily, but few understand the binary protocol that flows between clients and brokers. This knowledge gap becomes critical when enterprises need to implement proxy-based solutions for cost optimisation, security, or operational governance.
OSO engineers have spent considerable time working with organisations deploying Kafka partition remapping proxies, transparent intermediaries that present virtual partitions to clients whilst using fewer physical partitions on the broker. These deployments have reduced managed Kafka costs by up to 90% for enterprise clients running on Confluent Cloud and AWS MSK.
Understanding Kafka’s low-level protocol is essential for implementing partition remapping proxies that remain completely transparent to existing applications. By examining how requests and responses are structured, how partitions and offsets are represented, and how the protocol handles versioning, engineers can build proxies that intercept, transform, and forward messages without breaking client expectations.
This article walks through the Kafka protocol from bits to API calls, demonstrating how partition remapping proxies leverage this knowledge to translate between virtual and physical coordinate systems. The techniques described here form the foundation of OSO’s open-source Kafka Partition Remapper project.
The Kafka Binary Protocol Foundation
Why Kafka Uses TCP and Binary Encoding
Kafka uses a binary protocol over TCP rather than HTTP. This design choice enables high-throughput, low-latency message transport, but it requires specialised handling by any intermediary that sits between clients and brokers.
All Kafka APIs operate as request-response pairs, where clients send sequenced messages and brokers respond in order. A single TCP connection can multiplex many requests using correlation IDs—a monotonically incrementing number that links each response to its originating request. This multiplexing capability is why partition remapping proxies must carefully track which virtual partition mapping applies to each in-flight request.
The protocol’s binary nature means that proxies cannot simply inspect HTTP headers or parse JSON payloads. Instead, they must decode byte sequences according to precise specifications, transform the relevant fields, and re-encode the modified message before forwarding it.
Primitive Data Types That Build the Protocol
The Kafka protocol is constructed from a small set of primitive data types that appear throughout every API call:
Fixed-width integers form the backbone of the protocol. Boolean values occupy a single bit, whilst int8, int16, int32, and int64 types are stored in big-endian order. These appear extensively—partition IDs are int32, offsets are int64, and API keys are int16.
Unsigned varints were borrowed from Protocol Buffers to represent variable-length integers efficiently. The encoding uses a continuation bit in the most significant position of each byte. If this bit is set to 1, another byte follows; if set to 0, the current byte is the last. This approach saves space when representing small numbers whilst still supporting large values.
For example, the number 150 would be represented as two bytes in unsigned varint format:
Byte 1: 10010110 (continuation bit = 1, value bits = 0010110)
Byte 2: 00000001 (continuation bit = 0, value bits = 0000001)
The receiver drops the continuation bits, reverses the byte order (since varints use little-endian encoding), and reconstructs the value.
Compact strings and compact arrays use unsigned varints for length encoding, reducing wire overhead compared to fixed-width length prefixes. A compact string consists of an unsigned varint specifying the string length plus one, followed by the UTF-8 encoded bytes. Compact arrays follow the same pattern—an unsigned varint for the array size plus one, followed by the serialised elements.
For partition remapping proxies, understanding these primitives is essential. A proxy must decode these types correctly to locate partition and offset fields within requests, transform them according to the remapping configuration, and re-encode the modified message with correct length prefixes.
Anatomy of a Kafka API Request
How Requests Are Structured
Every Kafka API request follows a consistent structure that proxies must parse and potentially modify:
The length prefix occupies the first four bytes of every message, telling the recipient how much buffer to allocate for the rest of the message. When a partition remapping proxy transforms partition numbers or offsets, the total message size may change, requiring recalculation of this prefix before forwarding.
Request headers contain metadata that identifies and routes the request:
API key (int16): Identifies which API is being called. Produce is key 0, Fetch is key 1, Metadata is key 3, OffsetCommit is key 8, and so on—there are approximately 80 different API keys in modern Kafka.
API version (int16): Determines the exact wire format for both request and response. Different versions may have different fields, different field orderings, or different encoding rules.
Correlation ID (int32): Links responses to requests. Since multiple requests can be in-flight simultaneously on a single connection, this ID is how clients match responses to their originating requests.
Client ID (nullable string): An optional identifier for the client application.
Tagged fields: An extensibility mechanism that allows new optional fields to be added without breaking existing clients.
The request body follows the header and varies entirely based on the API key and version. A ProduceRequest contains topics, partitions, and record batches. A FetchRequest specifies which partitions to read from and at what offsets. A MetadataRequest may list specific topics or request information about all topics.
API Versioning and Client-Broker Negotiation
Kafka’s protocol versioning system enables backwards compatibility whilst allowing the protocol to evolve. When a client connects to a broker, one of its first actions is typically an ApiVersions request (API key 18) to discover which API versions the broker supports.
The ApiVersions request is special: brokers respond to it even before authentication completes. This allows clients to discover supported versions regardless of the security configuration. The response contains an array of API keys, each with its minimum and maximum supported versions.
Clients and brokers negotiate by finding the highest version both sides support. If a client needs features only available in version 9 of the Produce API but the broker only supports up to version 7, the client must either use the older version (losing access to newer features) or refuse to connect.
For partition remapping proxies, this versioning system has important implications. The proxy must understand multiple versions of each API it intercepts, since different clients may negotiate different versions. The proxy’s protocol parsing and transformation logic must handle the structural differences between versions—fields may be added, removed, or reordered across versions.
The APIs That Matter for Partition Remapping
Partition remapping proxies focus on a subset of Kafka’s APIs—those that reference partition IDs or offsets:
Metadata API (key 3): Returns information about topics, partitions, and brokers. The proxy transforms responses to present virtual partition counts rather than physical counts, and rewrites broker addresses to route all client connections through the proxy.
Produce API (key 0): Sends records to topics. The proxy maps virtual partition IDs to physical partition IDs in requests, and translates physical offsets back to virtual offsets in responses.
Fetch API (key 1): Retrieves records from topics. The proxy maps virtual partitions to physical partitions in requests, filters responses to include only records belonging to the requested virtual partition, and translates offsets.
OffsetCommit API (key 8) and OffsetFetch API (key 9): Store and retrieve consumer group offsets. The proxy translates between virtual and physical offset representations.
ListOffsets API (key 2): Queries offset information (earliest, latest, or by timestamp). The proxy translates offset values in both requests and responses.
How Partitions and Offsets Are Represented in the Protocol
Partition Fields in Produce and Fetch Requests
In a ProduceRequest, each topic contains an array of partition data. Each partition entry includes:
Partition ID (int32): The partition to write to
Record batch: The compressed, batched records to append
When a client produces to virtual partition 127, the proxy intercepts this request, calculates the corresponding physical partition using the formula physical_partition = virtual_partition % physical_partitions, and forwards the modified request to the broker.
The ProduceResponse contains results for each partition, including:
Partition ID (int32): Echoed back from the request
Error code (int16): Success or failure indicator
Base offset (int64): The offset assigned to the first record in the batch
The proxy must translate this base offset from physical space to virtual space before returning the response to the client. The client expects offsets consistent with the virtual partition it originally addressed.
The Offset Translation Formula
OSO engineers have implemented offset translation using a technique called offset windowing. Each virtual partition gets its own dedicated offset range within the physical partition’s offset space.
The formula for translating virtual offsets to physical offsets:
virtual_group = virtual_partition / physical_partitions
offset_range is a configurable value (default 2^40, approximately 1 trillion)
For example, with 100 virtual partitions mapping to 10 physical partitions:
Virtual Partition
Virtual Offset
Physical Partition
Physical Offset
0
5,000
0
5,000
10
5,000
0
1,099,511,632,776
50
500
0
5,497,558,139,380
99
0
9
9,895,604,649,984
Virtual partitions 0, 10, 20, 30, 40, 50, 60, 70, 80, and 90 all map to physical partition 0, but each occupies a distinct offset range within that physical partition.
The reverse translation extracts the virtual partition and offset from a physical offset:
This bidirectional translation must be deterministic and consistent across all proxy instances, which is why partition remapping proxies can be stateless—the mapping is purely mathematical.
Metadata Response Transformation
The Metadata API response is the most complex transformation because it defines the partition topology that clients see. When a broker responds with metadata for a topic with 10 physical partitions, the proxy expands this to 100 virtual partitions.
For each physical partition, the proxy generates compression_ratio virtual partitions (where compression_ratio = virtual_partitions / physical_partitions). Each virtual partition inherits the leader broker, ISR list, and replica list from its corresponding physical partition.
The proxy also rewrites all broker addresses in the metadata response to point to the proxy itself. This ensures that when clients connect to what they believe are different brokers for different partitions, all connections route through the proxy.
Storage Internals: What Happens When Messages Reach the Broker
Segment Files, Logs, and Indexes
Understanding how brokers store data illuminates why partition remapping works without broker modifications. Each partition on a broker creates a directory containing:
Log files (segment files): Binary files storing batches of records
Index files: Mapping offsets to byte positions within log files
Time index files: Mapping timestamps to offsets
When a producer sends a batch of records, the broker appends it to the active segment file for that partition. Segment files are named by their starting offset and roll over when they reach a configured size or age threshold.
Record Batch Structure
Record batches stored in segment files follow a defined binary structure:
The Attributes field contains flags indicating compression codec (GZIP, Snappy, LZ4, or zstd), whether the batch is transactional, and whether it’s a control batch (containing transaction markers rather than application records).
The CRC provides integrity checking—the broker validates this checksum when receiving batches from producers, ensuring data wasn’t corrupted in transit.
For partition remapping, the key insight is that the broker treats record values as opaque byte arrays. The broker doesn’t parse or validate record contents—it simply stores and retrieves bytes. This opacity is what makes transparent proxying possible. The broker doesn’t know or care that the offsets clients see differ from the physical offsets it assigns.
The High Water Mark and Log Stable Offset
Two pointers within each partition are relevant for understanding how consumers see data:
The high water mark (HWM) indicates the offset up to which all replicas have acknowledged the data. Consumers configured with isolation.level=read_committed (or the default read_uncommitted) can read up to the high water mark.
The log stable offset (LSO) indicates the offset up to which all transactions have been committed or aborted. For transactional consumers using isolation.level=read_committed, the LSO determines what’s visible.
Partition remapping proxies must respect these boundaries when filtering Fetch responses. If a consumer requests data from virtual partition 10 but the high water mark hasn’t advanced far enough to include committed data in that virtual partition’s offset range, the proxy returns an empty response rather than uncommitted data.
The Produce and Fetch Flow Through a Remapping Proxy
Produce Request Flow
When a client produces messages through a partition remapping proxy, the following sequence occurs:
Client prepares the request: The client serialises the message, applies the partitioner (using key hashing to select virtual partition 127), batches multiple messages together, and optionally compresses the batch.
Proxy receives the request: The proxy’s connection handler decodes the Kafka frame, extracts the API key and version, and routes the request to the produce handler.
Proxy transforms partition references: For each partition in the request, the proxy calculates the physical partition: 127 % 10 = 7. It also records the mapping context so it can correctly transform the response.
Proxy forwards to broker: The modified request, now targeting physical partition 7, is forwarded to the partition leader.
Broker processes the request: The broker appends the records to the segment file, replicates to followers, and waits for acknowledgement based on the client’s acks configuration.
Broker returns response: The response includes the base offset assigned to the batch—a physical offset in partition 7’s offset space.
Proxy transforms the response: Using the recorded mapping context, the proxy translates the physical base offset to a virtual base offset. If the physical offset was 1,000 and virtual partition 127 is in group 12 (since 127 / 10 = 12), the virtual offset remains 1,000 because it’s within that group’s offset window.
Proxy returns to client: The client receives a response that appears completely normal—partition 127 with the expected offset.
Fetch Request Flow
Fetch requests follow a similar pattern with an additional filtering step:
Client requests data: The client requests messages from virtual partition 127 starting at virtual offset 500.
Proxy transforms the request: The proxy maps virtual partition 127 to physical partition 7 and virtual offset 500 to physical offset (12 × offset_range) + 500.
Broker returns data: The broker returns all available records from physical partition 7 starting at the translated offset. Since physical partition 7 holds data from virtual partitions 7, 17, 27, 37, 47, 57, 67, 77, 87, and 97 (and also 107, 117, 127, etc.), the response may contain records from multiple virtual partitions.
Proxy filters the response: The proxy examines each record’s offset and filters out any records that don’t belong to virtual partition 127’s offset window. Only records where physical_offset / offset_range == 12 are retained.
Proxy translates offsets: For retained records, the proxy translates physical offsets back to virtual offsets.
Client receives filtered data: The client sees only records from virtual partition 127 with virtual offsets, exactly as expected.
Consumer Group Coordination
Consumer groups add complexity because partition assignments are negotiated between consumers and the group coordinator. When a new consumer joins a group:
The consumer sends a JoinGroup request listing the topics it wants to consume.
The group coordinator (a broker) assigns partitions to group members.
Consumers send SyncGroup requests to receive their assignments.
Consumers fetch from their assigned partitions.
The partition remapping proxy handles this by virtualising the partition count in Metadata responses. The group coordinator assigns virtual partitions (0-99) to consumers. The proxy then translates these virtual partition assignments to physical partition assignments when forwarding Fetch requests.
From the consumer’s perspective, it was assigned virtual partitions 50-59 and is fetching from them normally. The proxy silently maps these to physical partitions 0-9 with appropriate offset translations.
Practical Takeaways for Deploying Partition Remapping
Choosing Compression Ratios
The compression ratio—virtual partitions divided by physical partitions—determines both cost savings and operational overhead. OSO engineers typically recommend:
10:1 ratio: A common choice that provides 90% cost savings on partition-based pricing. Suitable for most workloads.
5:1 ratio: More conservative, reducing filtering overhead at the cost of smaller savings.
50:1 or higher: Aggressive compression for topics with many logical partitions but low throughput per partition.
Higher ratios mean more virtual partitions share each physical partition, increasing the filtering work during Fetch responses. For high-throughput topics, this overhead may become noticeable.
Per-Topic Configuration
Different topics often have different requirements. A high-volume event stream might need aggressive compression (100 virtual to 10 physical), whilst a low-volume configuration topic might use 1:1 mapping (no remapping).
The Kafka Partition Remapper supports per-topic configuration through exact topic names or regex patterns:
Partition remapping proxies should expose metrics that show both virtual and physical perspectives:
Virtual partition throughput: Messages per second per virtual partition, matching what clients expect to see.
Physical partition throughput: Actual load on broker partitions, useful for capacity planning.
Remapping operations: Count of partition and offset translations, helpful for understanding proxy overhead.
Filtering ratio: Percentage of fetched records that are filtered out, indicating how much extra data is retrieved and discarded.
The proxy exposes Prometheus metrics at a configurable endpoint, enabling integration with existing monitoring infrastructure.
Latency Considerations
Protocol parsing and offset translation add latency to every request. OSO engineers have measured typical P99 latency overhead of 1-3 milliseconds—negligible for most use cases compared to network round-trip times and broker processing.
Factors affecting proxy latency:
Compression ratio: Higher ratios require more filtering during Fetch processing.
Presents 1000 virtual partitions to clients by default (10:1 compression)
Applies SASL_SSL security for both client-facing and broker-facing connections
Provides higher compression (50:1) for the high-volume events topic
Disables remapping for the low-volume config topic
Exposes Prometheus metrics on port 9090
Conclusion
Understanding Kafka’s binary protocol—from primitive data types to API structures to storage internals—provides the foundation for implementing proxy-based solutions that transform messages whilst remaining transparent to clients.
Partition remapping proxies exploit Kafka’s design: the protocol’s explicit partition and offset fields are easily located and transformed, the broker treats record values as opaque bytes requiring no modification, and the deterministic mapping between virtual and physical coordinates requires no persistent state in the proxy.
OSO engineers have found that enterprises deploying partition remapping proxies achieve 70-90% cost savings on managed Kafka services like Confluent Cloud and AWS MSK, whilst maintaining complete transparency to existing applications. No client code changes required—the producer that wrote to virtual partition 127 continues to believe it wrote to partition 127, and the consumer that reads from virtual partition 127 sees exactly the messages it expects.
As managed Kafka pricing continues to emphasise per-partition costs, protocol-level solutions like partition remapping become increasingly important for enterprises seeking to balance scalability with cost efficiency. The protocol knowledge covered in this article provides the foundation for understanding, deploying, and troubleshooting these solutions.
For organisations running Kafka with more than 500 partitions, partition remapping represents one of the highest-ROI infrastructure investments available today. The Kafka Partition Remapper project provides a production-ready implementation of these concepts, enabling enterprises to reduce their Kafka costs without modifying applications or touching broker configurations.