Apache Kafka Part 5: KRaft - The Future of Kafka Architecture

Welcome to the final part of our Apache Kafka series! Today we’ll explore KRaft (Kafka Raft) - the game-changing consensus protocol that represents the future of Kafka architecture.

This is Part 5 of our comprehensive Kafka series. Make sure you’ve read the previous parts: Part 1 (Introduction), Part 2 (Building Blocks), Part 3 (Development Tools), and Part 4 (Administration).

What is KRaft?

KRaft (Kafka Raft Consensus Protocol) is Kafka’s native consensus mechanism that replaces Apache ZooKeeper for cluster coordination. Introduced in Kafka 2.8.0, KRaft represents a fundamental architectural shift toward a more streamlined, self-contained system.

The Evolution: From ZooKeeper to KRaft

Traditional Kafka Architecture (with ZooKeeper):
┌─────────────────────────────────────────────────┐
│              ZooKeeper Cluster                  │
│         (Manages Metadata & Coordination)       │
└─────────────────┬───────────────────────────────┘
                  │
        ┌─────────┼─────────┐
        │         │         │
        ▼         ▼         ▼
   ┌─────────┐ ┌─────────┐ ┌─────────┐
   │Broker 1 │ │Broker 2 │ │Broker 3 │
   │         │ │         │ │         │
   └─────────┘ └─────────┘ └─────────┘

Modern Kafka Architecture (with KRaft):
┌─────────────────────────────────────────────────┐
│           KRaft Controller Quorum               │
│    ┌─────────┐ ┌─────────┐ ┌─────────┐         │
│    │Ctrl 1   │◄┤Ctrl 2   │►│Ctrl 3   │         │
│    │(Leader) │ │         │ │         │         │
│    └─────────┘ └─────────┘ └─────────┘         │
└─────────────────┬───────────────────────────────┘
                  │ (Metadata Distribution)
        ┌─────────┼─────────┐
        │         │         │
        ▼         ▼         ▼
   ┌─────────┐ ┌─────────┐ ┌─────────┐
   │Broker 1 │ │Broker 2 │ │Broker 3 │
   │         │ │         │ │         │
   └─────────┘ └─────────┘ └─────────┘

Why Replace ZooKeeper?

Challenges with ZooKeeper

Operational Complexity: Managing two separate systems (Kafka + ZooKeeper)
Scaling Limitations: ZooKeeper becomes a bottleneck at scale
Metadata Propagation: Inefficient broadcast model
Resource Overhead: Additional infrastructure requirements

KRaft Advantages

Simplified Architecture: KRaft eliminates external dependencies, making Kafka truly self-contained and easier to operate.

Simpler Deployment: Single system to manage and monitor
Improved Scalability: Better handling of large clusters
Faster Metadata Operations: More efficient consensus mechanism
Right-sized Clusters: No need for separate ZooKeeper ensemble
Faster Recovery: Quicker failover and startup times

How KRaft Works

The Raft Consensus Algorithm

KRaft implements the Raft consensus algorithm, a well-understood distributed consensus protocol that ensures:

Leader Election: Automatic selection of a single leader
Log Replication: Consistent state across all nodes
Safety: Strong consistency guarantees

KRaft Architecture Components

# KRaft cluster roles
Controller Nodes: Manage cluster metadata and consensus
Broker Nodes: Handle client requests and data storage
Combined Nodes: Act as both controller and broker (for smaller deployments)

Metadata Management

KRaft stores all cluster metadata in a special internal topic:

# The metadata topic
__cluster_metadata

# What it contains:
- Cluster membership information
- Controller election state
- Topic configurations (partitions, replicas)
- Access Control Lists (ACLs)
- Quota configurations

KRaft Architecture Deep Dive

Controller Quorum

# Example KRaft controller configuration
process.roles=controller
node.id=1
controller.quorum.voters=1@localhost:9093,2@localhost:9094,3@localhost:9095
listeners=CONTROLLER://localhost:9093
controller.listener.names=CONTROLLER
log.dirs=/var/kafka-logs

Metadata Synchronization

Instead of ZooKeeper’s broadcast model, KRaft uses a pull-based approach:

Active Controller: Leader of the metadata partition
Follower Controllers: Replica followers of metadata
Brokers: Replica observers that fetch metadata changes

KRaft Metadata Synchronization Flow:

Active Controller    Follower Controller    Broker
       │                     │               │
       │ 1. Write metadata   │               │
       │    change           │               │
       │◄────────────────────┼───────────────┤
       │                     │ 2. Fetch      │
       │                     │    latest     │
       │                     │    metadata   │
       │─────────────────────┤               │
       │ 3. Metadata updates │               │
       │                     │               │
       │◄────────────────────┼───────────────┤
       │                     │               │ 4. Fetch
       │                     │               │    latest
       │                     │               │    metadata
       │─────────────────────┼───────────────┤
       │                     │ 5. Metadata   │
       │                     │    updates    │

Benefits of Pull-Based Model

Faster Restarts: Brokers load entire metadata cache on demand
Better Synchronization: All nodes stay in sync automatically
Reduced Network Traffic: Efficient metadata distribution

Setting Up KRaft

Controller Configuration

# kraft-controller.properties
process.roles=controller
node.id=1
controller.quorum.voters=1@controller1:9093,2@controller2:9093,3@controller3:9093
listeners=CONTROLLER://localhost:9093
controller.listener.names=CONTROLLER
log.dirs=/var/kafka-controller-logs
metadata.log.dir=/var/kafka-controller-logs

Broker Configuration

# kraft-broker.properties
process.roles=broker
node.id=101
controller.quorum.voters=1@controller1:9093,2@controller2:9093,3@controller3:9093
listeners=PLAINTEXT://localhost:9092
log.dirs=/var/kafka-broker-logs

Combined Node Configuration

# kraft-combined.properties (for smaller deployments)
process.roles=broker,controller
node.id=1
controller.quorum.voters=1@localhost:9093,2@localhost:9094,3@localhost:9095
listeners=PLAINTEXT://localhost:9092,CONTROLLER://localhost:9093
controller.listener.names=CONTROLLER
log.dirs=/var/kafka-logs

Migration from ZooKeeper to KRaft

Migration Process Overview

Production Migration: Migrating from ZooKeeper to KRaft in production requires careful planning and testing. Always test in a staging environment first.

# Step 1: Prepare the migration
kafka-storage.sh format -t <cluster-id> -c kraft-controller.properties

# Step 2: Start KRaft controllers
kafka-server-start.sh kraft-controller.properties

# Step 3: Migrate metadata
kafka-metadata-shell.sh --snapshot /path/to/metadata/snapshot

# Step 4: Start KRaft brokers
kafka-server-start.sh kraft-broker.properties

Migration Considerations

Cluster ID: Generate and maintain consistent cluster ID
Metadata Export: Export existing ZooKeeper metadata
Rolling Migration: Gradual transition of brokers
Validation: Verify metadata consistency post-migration

Monitoring KRaft Clusters

KRaft-Specific Metrics

# Controller metrics
kafka.controller:type=KafkaController,name=ActiveControllerCount
kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs

# Metadata metrics
kafka.server:type=metadata-log,name=NumRecordsInLog
kafka.server:type=metadata-log,name=CommittedOffset

Health Checks

# Check controller status
kafka-metadata-shell.sh --snapshot /var/kafka-logs/__cluster_metadata-0

# Verify quorum health
kafka-log-dirs.sh --bootstrap-server localhost:9092 --describe

Performance Improvements

Startup Time Comparison

# Traditional Kafka with ZooKeeper
Startup Time: 30-60 seconds (depending on metadata size)

# KRaft Mode
Startup Time: 5-15 seconds (faster metadata loading)

Scalability Improvements

Larger Clusters: Support for 100,000+ partitions
Faster Metadata Operations: Reduced latency for topic operations
Better Resource Utilization: No ZooKeeper overhead

Production Readiness

KRaft Maturity Timeline

Kafka 2.8.0 (April 2021): Early Access
Kafka 3.0.0 (September 2021): Production Ready (with limitations)
Kafka 3.3.0 (October 2022): Feature Complete
Kafka 3.5.0+ (June 2023): Fully Production Ready

Current Limitations (as of Kafka 3.6)

Migration Path: While KRaft is production-ready, some advanced features are still being migrated from the ZooKeeper implementation.

JBOD (Just a Bunch of Disks): Limited support
Delegation Tokens: Not yet supported
Some Admin Operations: Still being ported

Best Practices for KRaft

Controller Deployment

# Recommended controller setup
- Use dedicated controller nodes for large clusters
- Deploy controllers across different availability zones
- Use odd number of controllers (3 or 5)
- Ensure fast, reliable network between controllers

Resource Planning

# Controller resource requirements
CPU: 2-4 cores per controller
Memory: 4-8 GB heap size
Storage: Fast SSD for metadata logs
Network: Low-latency, high-bandwidth

Security Configuration

# KRaft security settings
controller.listener.names=CONTROLLER
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
ssl.keystore.location=/path/to/controller.keystore.jks

The Future of Kafka

Roadmap Highlights

Complete ZooKeeper Removal: Full feature parity achieved
Enhanced Scalability: Support for even larger clusters
Improved Operations: Better tooling and monitoring
Cloud-Native Features: Better Kubernetes integration

Industry Impact

KRaft positions Kafka as:

Simpler to Operate: Reduced operational complexity
More Scalable: Better performance at scale
Cloud-Ready: Easier containerization and orchestration
Future-Proof: Modern architecture for next-generation workloads

Series Conclusion

Throughout this 5-part series, we’ve explored:

Kafka Fundamentals: Event streaming concepts and motivation
Core Building Blocks: Topics, partitions, producers, and consumers
Development Tools: APIs and frameworks for building applications
Administration: Monitoring, security, and operational excellence
KRaft Architecture: The future of Kafka without ZooKeeper

Key Takeaways

KRaft Simplifies Operations: Eliminates ZooKeeper dependency
Better Performance: Faster startup and metadata operations
Production Ready: Suitable for enterprise deployments
Future Direction: Represents Kafka’s architectural evolution
Migration Path: Gradual transition from ZooKeeper is possible

The Future is KRaft: As ZooKeeper support will eventually be deprecated, adopting KRaft positions your Kafka infrastructure for long-term success.

Apache Kafka with KRaft represents the culmination of years of architectural evolution, delivering a more robust, scalable, and operationally simple event streaming platform. Whether you’re starting fresh or planning a migration, KRaft is the foundation for your event-driven future.