Apache Kafka Part 5: KRaft - The Future of Kafka Architecture
Welcome to the final part of our Apache Kafka series! Today weβll explore KRaft (Kafka Raft) - the game-changing consensus protocol that represents the future of Kafka architecture.
This is Part 5 of our comprehensive Kafka series. Make sure youβve read the previous parts: Part 1 (Introduction), Part 2 (Building Blocks), Part 3 (Development Tools), and Part 4 (Administration).
What is KRaft?
KRaft (Kafka Raft Consensus Protocol) is Kafkaβs native consensus mechanism that replaces Apache ZooKeeper for cluster coordination. Introduced in Kafka 2.8.0, KRaft represents a fundamental architectural shift toward a more streamlined, self-contained system.
The Evolution: From ZooKeeper to KRaft
Traditional Kafka Architecture (with ZooKeeper):
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β ZooKeeper Cluster β
β (Manages Metadata & Coordination) β
βββββββββββββββββββ¬ββββββββββββββββββββββββββββββββ
β
βββββββββββΌββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββ βββββββββββ βββββββββββ
βBroker 1 β βBroker 2 β βBroker 3 β
β β β β β β
βββββββββββ βββββββββββ βββββββββββ
Modern Kafka Architecture (with KRaft):
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β KRaft Controller Quorum β
β βββββββββββ βββββββββββ βββββββββββ β
β βCtrl 1 βββ€Ctrl 2 ββΊβCtrl 3 β β
β β(Leader) β β β β β β
β βββββββββββ βββββββββββ βββββββββββ β
βββββββββββββββββββ¬ββββββββββββββββββββββββββββββββ
β (Metadata Distribution)
βββββββββββΌββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββ βββββββββββ βββββββββββ
βBroker 1 β βBroker 2 β βBroker 3 β
β β β β β β
βββββββββββ βββββββββββ βββββββββββ
Why Replace ZooKeeper?
Challenges with ZooKeeper
- Operational Complexity: Managing two separate systems (Kafka + ZooKeeper)
- Scaling Limitations: ZooKeeper becomes a bottleneck at scale
- Metadata Propagation: Inefficient broadcast model
- Resource Overhead: Additional infrastructure requirements
KRaft Advantages
Simplified Architecture: KRaft eliminates external dependencies, making Kafka truly self-contained and easier to operate.
- Simpler Deployment: Single system to manage and monitor
- Improved Scalability: Better handling of large clusters
- Faster Metadata Operations: More efficient consensus mechanism
- Right-sized Clusters: No need for separate ZooKeeper ensemble
- Faster Recovery: Quicker failover and startup times
How KRaft Works
The Raft Consensus Algorithm
KRaft implements the Raft consensus algorithm, a well-understood distributed consensus protocol that ensures:
- Leader Election: Automatic selection of a single leader
- Log Replication: Consistent state across all nodes
- Safety: Strong consistency guarantees
KRaft Architecture Components
# KRaft cluster roles
Controller Nodes: Manage cluster metadata and consensus
Broker Nodes: Handle client requests and data storage
Combined Nodes: Act as both controller and broker (for smaller deployments)
Metadata Management
KRaft stores all cluster metadata in a special internal topic:
# The metadata topic
__cluster_metadata
# What it contains:
- Cluster membership information
- Controller election state
- Topic configurations (partitions, replicas)
- Access Control Lists (ACLs)
- Quota configurations
KRaft Architecture Deep Dive
Controller Quorum
# Example KRaft controller configuration
process.roles=controller
node.id=1
controller.quorum.voters=1@localhost:9093,2@localhost:9094,3@localhost:9095
listeners=CONTROLLER://localhost:9093
controller.listener.names=CONTROLLER
log.dirs=/var/kafka-logs
Metadata Synchronization
Instead of ZooKeeperβs broadcast model, KRaft uses a pull-based approach:
- Active Controller: Leader of the metadata partition
- Follower Controllers: Replica followers of metadata
- Brokers: Replica observers that fetch metadata changes
KRaft Metadata Synchronization Flow:
Active Controller Follower Controller Broker
β β β
β 1. Write metadata β β
β change β β
βββββββββββββββββββββββΌββββββββββββββββ€
β β 2. Fetch β
β β latest β
β β metadata β
βββββββββββββββββββββββ€ β
β 3. Metadata updates β β
β β β
βββββββββββββββββββββββΌββββββββββββββββ€
β β β 4. Fetch
β β β latest
β β β metadata
βββββββββββββββββββββββΌββββββββββββββββ€
β β 5. Metadata β
β β updates β
Benefits of Pull-Based Model
- Faster Restarts: Brokers load entire metadata cache on demand
- Better Synchronization: All nodes stay in sync automatically
- Reduced Network Traffic: Efficient metadata distribution
Setting Up KRaft
Controller Configuration
# kraft-controller.properties
process.roles=controller
node.id=1
controller.quorum.voters=1@controller1:9093,2@controller2:9093,3@controller3:9093
listeners=CONTROLLER://localhost:9093
controller.listener.names=CONTROLLER
log.dirs=/var/kafka-controller-logs
metadata.log.dir=/var/kafka-controller-logs
Broker Configuration
# kraft-broker.properties
process.roles=broker
node.id=101
controller.quorum.voters=1@controller1:9093,2@controller2:9093,3@controller3:9093
listeners=PLAINTEXT://localhost:9092
log.dirs=/var/kafka-broker-logs
Combined Node Configuration
# kraft-combined.properties (for smaller deployments)
process.roles=broker,controller
node.id=1
controller.quorum.voters=1@localhost:9093,2@localhost:9094,3@localhost:9095
listeners=PLAINTEXT://localhost:9092,CONTROLLER://localhost:9093
controller.listener.names=CONTROLLER
log.dirs=/var/kafka-logs
Migration from ZooKeeper to KRaft
Migration Process Overview
Production Migration: Migrating from ZooKeeper to KRaft in production requires careful planning and testing. Always test in a staging environment first.
# Step 1: Prepare the migration
kafka-storage.sh format -t <cluster-id> -c kraft-controller.properties
# Step 2: Start KRaft controllers
kafka-server-start.sh kraft-controller.properties
# Step 3: Migrate metadata
kafka-metadata-shell.sh --snapshot /path/to/metadata/snapshot
# Step 4: Start KRaft brokers
kafka-server-start.sh kraft-broker.properties
Migration Considerations
- Cluster ID: Generate and maintain consistent cluster ID
- Metadata Export: Export existing ZooKeeper metadata
- Rolling Migration: Gradual transition of brokers
- Validation: Verify metadata consistency post-migration
Monitoring KRaft Clusters
KRaft-Specific Metrics
# Controller metrics
kafka.controller:type=KafkaController,name=ActiveControllerCount
kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs
# Metadata metrics
kafka.server:type=metadata-log,name=NumRecordsInLog
kafka.server:type=metadata-log,name=CommittedOffset
Health Checks
# Check controller status
kafka-metadata-shell.sh --snapshot /var/kafka-logs/__cluster_metadata-0
# Verify quorum health
kafka-log-dirs.sh --bootstrap-server localhost:9092 --describe
Performance Improvements
Startup Time Comparison
# Traditional Kafka with ZooKeeper
Startup Time: 30-60 seconds (depending on metadata size)
# KRaft Mode
Startup Time: 5-15 seconds (faster metadata loading)
Scalability Improvements
- Larger Clusters: Support for 100,000+ partitions
- Faster Metadata Operations: Reduced latency for topic operations
- Better Resource Utilization: No ZooKeeper overhead
Production Readiness
KRaft Maturity Timeline
Kafka 2.8.0 (April 2021): Early Access
Kafka 3.0.0 (September 2021): Production Ready (with limitations)
Kafka 3.3.0 (October 2022): Feature Complete
Kafka 3.5.0+ (June 2023): Fully Production Ready
Current Limitations (as of Kafka 3.6)
Migration Path: While KRaft is production-ready, some advanced features are still being migrated from the ZooKeeper implementation.
- JBOD (Just a Bunch of Disks): Limited support
- Delegation Tokens: Not yet supported
- Some Admin Operations: Still being ported
Best Practices for KRaft
Controller Deployment
# Recommended controller setup
- Use dedicated controller nodes for large clusters
- Deploy controllers across different availability zones
- Use odd number of controllers (3 or 5)
- Ensure fast, reliable network between controllers
Resource Planning
# Controller resource requirements
CPU: 2-4 cores per controller
Memory: 4-8 GB heap size
Storage: Fast SSD for metadata logs
Network: Low-latency, high-bandwidth
Security Configuration
# KRaft security settings
controller.listener.names=CONTROLLER
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
ssl.keystore.location=/path/to/controller.keystore.jks
The Future of Kafka
Roadmap Highlights
- Complete ZooKeeper Removal: Full feature parity achieved
- Enhanced Scalability: Support for even larger clusters
- Improved Operations: Better tooling and monitoring
- Cloud-Native Features: Better Kubernetes integration
Industry Impact
KRaft positions Kafka as:
- Simpler to Operate: Reduced operational complexity
- More Scalable: Better performance at scale
- Cloud-Ready: Easier containerization and orchestration
- Future-Proof: Modern architecture for next-generation workloads
Series Conclusion
Throughout this 5-part series, weβve explored:
- Kafka Fundamentals: Event streaming concepts and motivation
- Core Building Blocks: Topics, partitions, producers, and consumers
- Development Tools: APIs and frameworks for building applications
- Administration: Monitoring, security, and operational excellence
- KRaft Architecture: The future of Kafka without ZooKeeper
Key Takeaways
- KRaft Simplifies Operations: Eliminates ZooKeeper dependency
- Better Performance: Faster startup and metadata operations
- Production Ready: Suitable for enterprise deployments
- Future Direction: Represents Kafkaβs architectural evolution
- Migration Path: Gradual transition from ZooKeeper is possible
The Future is KRaft: As ZooKeeper support will eventually be deprecated, adopting KRaft positions your Kafka infrastructure for long-term success.
Apache Kafka with KRaft represents the culmination of years of architectural evolution, delivering a more robust, scalable, and operationally simple event streaming platform. Whether youβre starting fresh or planning a migration, KRaft is the foundation for your event-driven future.
Part 5 of 5
Comments
Join the discussion and share your thoughts