Apache Kafka: Part 6 - KRaft, The Future of Kafka Architecture

Welcome to the final part of my Apache Kafka series! Today we’ll explore KRaft (Kafka Raft). This is the game changing consensus protocol that represents the future of Kafka architecture.

Full disclosure: This content comes from the Confluent Apache Kafka Fundamentals Course at training.confluent.io. Make sure you’ve read the previous parts: Part 1, Part 2, Part 3, Part 4, and Part 5.


What is KRaft?

KRaft (Kafka Raft Consensus Protocol) is Kafka’s native consensus mechanism that replaces Apache ZooKeeper for cluster coordination. Introduced in Kafka 2.8.0, KRaft represents a fundamental architectural shift toward a more streamlined, self contained system.

The Evolution: From ZooKeeper to KRaft

Traditional Kafka Architecture (with ZooKeeper):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              ZooKeeper Cluster                  β”‚
β”‚         (Manages Metadata & Coordination)       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚         β”‚         β”‚
        β–Ό         β–Ό         β–Ό
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚Broker 1 β”‚ β”‚Broker 2 β”‚ β”‚Broker 3 β”‚
   β”‚         β”‚ β”‚         β”‚ β”‚         β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Modern Kafka Architecture (with KRaft):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           KRaft Controller Quorum               β”‚
β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”‚
β”‚    β”‚Ctrl 1   │◄─Ctrl 2   β”‚β–Ίβ”‚Ctrl 3   β”‚         β”‚
β”‚    β”‚(Leader) β”‚ β”‚         β”‚ β”‚         β”‚         β”‚
β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚ (Metadata Distribution)
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚         β”‚         β”‚
        β–Ό         β–Ό         β–Ό
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚Broker 1 β”‚ β”‚Broker 2 β”‚ β”‚Broker 3 β”‚
   β”‚         β”‚ β”‚         β”‚ β”‚         β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The diagrams above show the difference. With KRaft, everything is self contained within Kafka.


Why Replace ZooKeeper?

Challenges with ZooKeeper

  1. Operational Complexity: Managing two separate systems (Kafka + ZooKeeper)
  2. Scaling Limitations: ZooKeeper becomes a bottleneck at scale
  3. Metadata Propagation: Inefficient broadcast model
  4. Resource Overhead: Additional infrastructure requirements

Running ZooKeeper alongside Kafka means more moving parts, more things that can break, and more things to monitor.

KRaft Advantages

Simplified Architecture: KRaft eliminates external dependencies, making Kafka truly self contained and easier to operate.

  • Simpler Deployment: Single system to manage and monitor
  • Improved Scalability: Better handling of large clusters
  • Faster Metadata Operations: More efficient consensus mechanism
  • Right sized Clusters: No need for separate ZooKeeper ensemble
  • Faster Recovery: Quicker failover and startup times

How KRaft Works

The Raft Consensus Algorithm

KRaft implements the Raft consensus algorithm. This is a well understood distributed consensus protocol that ensures:

  • Leader Election: Automatic selection of a single leader
  • Log Replication: Consistent state across all nodes
  • Safety: Strong consistency guarantees

KRaft Architecture Components

# KRaft cluster roles
Controller Nodes: Manage cluster metadata and consensus
Broker Nodes: Handle client requests and data storage
Combined Nodes: Act as both controller and broker (for smaller deployments)

Metadata Management

KRaft stores all cluster metadata in a special internal topic:

# The metadata topic
__cluster_metadata

# What it contains:
- Cluster membership information
- Controller election state
- Topic configurations (partitions, replicas)
- Access Control Lists (ACLs)
- Quota configurations

KRaft Architecture Deep Dive

Controller Quorum

# Example KRaft controller configuration
process.roles=controller
node.id=1
controller.quorum.voters=1@localhost:9093,2@localhost:9094,3@localhost:9095
listeners=CONTROLLER://localhost:9093
controller.listener.names=CONTROLLER
log.dirs=/var/kafka-logs

Metadata Synchronization

Instead of ZooKeeper’s broadcast model, KRaft uses a pull based approach:

  1. Active Controller: Leader of the metadata partition
  2. Follower Controllers: Replica followers of metadata
  3. Brokers: Replica observers that fetch metadata changes
KRaft Metadata Synchronization Flow:

Active Controller    Follower Controller    Broker
       β”‚                     β”‚               β”‚
       β”‚ 1. Write metadata   β”‚               β”‚
       β”‚    change           β”‚               β”‚
       │◄────────────────────┼────────────────
       β”‚                     β”‚ 2. Fetch      β”‚
       β”‚                     β”‚    latest     β”‚
       β”‚                     β”‚    metadata   β”‚
       │──────────────────────               β”‚
       β”‚ 3. Metadata updates β”‚               β”‚
       β”‚                     β”‚               β”‚
       │◄────────────────────┼────────────────
       β”‚                     β”‚               β”‚ 4. Fetch
       β”‚                     β”‚               β”‚    latest
       β”‚                     β”‚               β”‚    metadata
       │─────────────────────┼────────────────
       β”‚                     β”‚ 5. Metadata   β”‚
       β”‚                     β”‚    updates    β”‚

Benefits of Pull Based Model

  • Faster Restarts: Brokers load entire metadata cache on demand
  • Better Synchronization: All nodes stay in sync automatically
  • Reduced Network Traffic: Efficient metadata distribution

Setting Up KRaft

Controller Configuration

# kraft-controller.properties
process.roles=controller
node.id=1
controller.quorum.voters=1@controller1:9093,2@controller2:9093,3@controller3:9093
listeners=CONTROLLER://localhost:9093
controller.listener.names=CONTROLLER
log.dirs=/var/kafka-controller-logs
metadata.log.dir=/var/kafka-controller-logs

Broker Configuration

# kraft-broker.properties
process.roles=broker
node.id=101
controller.quorum.voters=1@controller1:9093,2@controller2:9093,3@controller3:9093
listeners=PLAINTEXT://localhost:9092
log.dirs=/var/kafka-broker-logs

Combined Node Configuration

# kraft-combined.properties (for smaller deployments)
process.roles=broker,controller
node.id=1
controller.quorum.voters=1@localhost:9093,2@localhost:9094,3@localhost:9095
listeners=PLAINTEXT://localhost:9092,CONTROLLER://localhost:9093
controller.listener.names=CONTROLLER
log.dirs=/var/kafka-logs

For smaller setups or development environments, combined nodes work great. For production, you’ll probably want dedicated controllers.


Migration from ZooKeeper to KRaft

Migration Process Overview

Production Migration: Migrating from ZooKeeper to KRaft in production requires careful planning and testing. Always test in a staging environment first!

# Step 1: Prepare the migration
kafka-storage.sh format -t <cluster-id> -c kraft-controller.properties

# Step 2: Start KRaft controllers
kafka-server-start.sh kraft-controller.properties

# Step 3: Migrate metadata
kafka-metadata-shell.sh --snapshot /path/to/metadata/snapshot

# Step 4: Start KRaft brokers
kafka-server-start.sh kraft-broker.properties

Migration Considerations

  1. Cluster ID: Generate and maintain consistent cluster ID
  2. Metadata Export: Export existing ZooKeeper metadata
  3. Rolling Migration: Gradual transition of brokers
  4. Validation: Verify metadata consistency after migration

Monitoring KRaft Clusters

KRaft Specific Metrics

# Controller metrics
kafka.controller:type=KafkaController,name=ActiveControllerCount
kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs

# Metadata metrics
kafka.server:type=metadata-log,name=NumRecordsInLog
kafka.server:type=metadata-log,name=CommittedOffset

Health Checks

# Check controller status
kafka-metadata-shell.sh --snapshot /var/kafka-logs/__cluster_metadata-0

# Verify quorum health
kafka-log-dirs.sh --bootstrap-server localhost:9092 --describe

Performance Improvements

Startup Time Comparison

# Traditional Kafka with ZooKeeper
Startup Time: 30-60 seconds (depending on metadata size)

# KRaft Mode
Startup Time: 5-15 seconds (faster metadata loading)

That’s a huge improvement! Faster startup means faster recovery from failures.

Scalability Improvements

  • Larger Clusters: Support for 100,000+ partitions
  • Faster Metadata Operations: Reduced latency for topic operations
  • Better Resource Utilization: No ZooKeeper overhead

Production Readiness

KRaft Maturity Timeline

Kafka 2.8.0 (April 2021): Early Access
Kafka 3.0.0 (September 2021): Production Ready (with limitations)
Kafka 3.3.0 (October 2022): Feature Complete
Kafka 3.5.0+ (June 2023): Fully Production Ready

Current Limitations (as of Kafka 3.6)

Migration Path: While KRaft is production ready, some advanced features are still being migrated from the ZooKeeper implementation.

  • JBOD (Just a Bunch of Disks): Limited support
  • Delegation Tokens: Not yet supported
  • Some Admin Operations: Still being ported

These limitations are being addressed in newer Kafka releases.


Best Practices for KRaft

Controller Deployment

# Recommended controller setup
- Use dedicated controller nodes for large clusters
- Deploy controllers across different availability zones
- Use odd number of controllers (3 or 5)
- Ensure fast, reliable network between controllers

Resource Planning

# Controller resource requirements
CPU: 2-4 cores per controller
Memory: 4-8 GB heap size
Storage: Fast SSD for metadata logs
Network: Low latency, high bandwidth

Security Configuration

# KRaft security settings
controller.listener.names=CONTROLLER
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
ssl.keystore.location=/path/to/controller.keystore.jks

The Future of Kafka

Roadmap Highlights

  • Complete ZooKeeper Removal: Full feature parity achieved
  • Enhanced Scalability: Support for even larger clusters
  • Improved Operations: Better tooling and monitoring
  • Cloud Native Features: Better Kubernetes integration

Industry Impact

KRaft positions Kafka as:

  • Simpler to Operate: Reduced operational complexity
  • More Scalable: Better performance at scale
  • Cloud Ready: Easier containerization and orchestration
  • Future Proof: Modern architecture for next generation workloads

Series Conclusion

Throughout this 6 part series, we’ve explored:

  1. Kafka Fundamentals: Event streaming concepts and motivation
  2. Core Building Blocks: Topics, partitions, producers, and consumers
  3. Cluster Architecture: Creating topics and understanding partition distribution
  4. Development Tools: APIs and frameworks for building applications
  5. Administration: Monitoring, security, and operational excellence
  6. KRaft Architecture: The future of Kafka without ZooKeeper

It’s been quite a journey! I hope these posts have helped you understand Kafka better.


Key Takeaways

  • KRaft Simplifies Operations: Eliminates ZooKeeper dependency
  • Better Performance: Faster startup and metadata operations
  • Production Ready: Suitable for enterprise deployments
  • Future Direction: Represents Kafka’s architectural evolution
  • Migration Path: Gradual transition from ZooKeeper is possible

The Future is KRaft: As ZooKeeper support will eventually be deprecated, adopting KRaft positions your Kafka infrastructure for long term success.

Apache Kafka with KRaft represents the culmination of years of architectural evolution. It delivers a more robust, scalable, and operationally simple event streaming platform. Whether you’re starting fresh or planning a migration, KRaft is the foundation for your event driven future.

Thanks for following along with this series!

Comments

Join the discussion and share your thoughts