“Our service crashed when traffic suddenly spiked…” “An order came in but it wasn’t reflected in the inventory system…” “There are too many logs to monitor in real-time…” Sound familiar? Apache Kafka is the tool that solves these exact problems. Let’s explore it step by step.

1. What is Kafka? – Think of it as an Express Postal Service
Simply put, Kafka is like an ultra-fast postal service.
Regular postal service works like this
Sender → Post Office → Receiver
One person sends a letter, one person receives it. Once the receiver reads it, it’s done.
The Kafka postal service is different
Multiple Senders → Kafka → Multiple Receivers
(Storage) (Each gets a copy)
What makes it special:
- Can handle millions of letters per day (incredible speed)
- Keeps letters for a set period so you can read them later
- Multiple people can read the same letter independently
- Creates backup copies in multiple locations to prevent loss
This is exactly what Kafka does!
Understanding with a real example
The moment you click “Order” on Amazon:
① Order System: "New order received!" → Sends message to Kafka
② Kafka simultaneously delivers this message to multiple systems:
- Inventory System: "Need to deduct 1 item"
- Payment System: "Process card payment"
- Shipping System: "Prepare for shipment"
- Notification System: "Send confirmation email"
- Analytics System: "Add to today's sales"
Without Kafka:
- Order system must connect to all 5 systems individually
- If payment system is down? Order system stops too
- Adding a recommendation system later? Must modify order system code
With Kafka:
- Order system just sends to Kafka
- Each system operates independently
- Adding new systems? Just read from Kafka
2. Why Was Kafka Created? – LinkedIn’s Challenge
In 2011, LinkedIn had a major problem.
Tangled Systems
[System Architecture]
User Registration ─┬→ Email System
├→ Recommendation System
├→ Search System
└→ Analytics System
Email System ──┬→ Notification System
└→ Log System
Recommendation ┬→ Analytics System
└→ Search System
Problems:
- 20 systems = hundreds of connections
- One system fails → must check all connected systems
- Adding new systems → connect to all existing systems
- Processing 1.4 billion messages daily → systems struggling
Simplification with Kafka
[After Kafka]
User Registration ─┐
Email System ──────┤
Recommendation ────┤→ Kafka ─┤→ Notification System
Search System ─────┤ ├→ Log System
Analytics System ──┘ ├→ Search System
└→ Recommendation System
Results:
- All systems only communicate with Kafka
- One system’s failure doesn’t affect others
- Adding new systems is simple
- Handles millions of messages per second easily
3. Core Concepts – Real-Life Analogies
Producer = Letter Sender
Applications that create data and send it to Kafka.
Real examples:
- Web server: “User viewed product page”
- Mobile app: “User opened app”
- IoT sensor: “Current temperature is 77°F”
- Order system: “New order received”
Consumer = Letter Receiver
Applications that read and process data from Kafka.
Real examples:
- Recommendation system: Reads click data → Updates recommendation algorithm
- Notification system: Reads order data → Sends customer SMS
- Monitoring system: Reads error logs → Alerts on-call engineer
Topic = Mailbox Label
Categories that organize messages by type. Like bookshelf labels in a library.
[Kafka Topics]
📬 user.signup (User registration events)
📬 order.created (Order creation events)
📬 payment.success (Payment completion events)
📬 sensor.temp (Temperature sensor data)
Each system subscribes only to the topics it needs.
Partition = Multiple Service Windows
Topics split into multiple pieces. Think of bank teller windows.
[Bank Analogy]
1 Window:
100 customers → 1 window → Takes 100 minutes
5 Windows:
100 customers → 5 windows → Takes 20 minutes
Kafka works the same way:
[Order Topic - 1 Partition]
Orders 1, 2, 3, 4, 5... → Partition 0 → Slow processing
[Order Topic - 3 Partitions]
Orders 1, 4, 7... → Partition 0 ↘
Orders 2, 5, 8... → Partition 1 → Parallel → 3x faster!
Orders 3, 6, 9... → Partition 2 ↗
How are messages distributed?
- With key: Same key always goes to same partition
All user123 activities → Always partition 0 All user456 activities → Always partition 1Use this when order matters.
- Without key: Evenly distributed
Message 1 → Partition 0 Message 2 → Partition 1 Message 3 → Partition 2 Message 4 → Partition 0
Offset = Page Number
A number indicating each message’s position.
[Partition 0 Messages]
Offset: 0 1 2 3 4 5
Message: [Ord1] [Ord4] [Ord7] [Ord10] [Ord13] [Ord16]
↑
Read up to here (offset 3)
Why is this important?
Like bookmarking your place:
- “Where did I stop reading?” → Check offset
- System restarts → Continue from last offset
- Made a mistake? → Rewind offset and reread
Broker = Post Office Branch
The actual Kafka server that stores and manages messages.
[Single Broker - Risky]
1 broker → Complete failure if it crashes ❌
[Cluster - Safe]
Broker 1 (stores messages A, B)
Broker 2 (copy of A, B)
Broker 3 (copy of A, B)
→ Service continues even if 1-2 fail ✓
Consumer Group = Collaborative Team
Multiple consumers forming a team to divide and process work.
[Order Processing Team - 4 Members]
Order Topic (4 partitions)
├─ Partition 0 → Employee A processes
├─ Partition 1 → Employee B processes
├─ Partition 2 → Employee C processes
└─ Partition 3 → Employee D processes
Result: 4x faster processing!
Important rules:
- One partition can only be processed by one consumer in a group
- 5 employees but 4 partitions? → 1 employee waits
- 2 employees but 4 partitions? → Each handles 2
Different groups are independent:
[Each team uses order data independently]
Order Topic
├─ Shipping Team: Prepares shipping addresses
├─ Analytics Team: Calculates sales
└─ Notification Team: Sends customer messages
(Each team processes independently)
Replication = Backup
Data copied across multiple brokers for safe storage.
[Replication Factor = 3]
Partition 0 Original (Leader) → Broker 1
Partition 0 Replica (Follower) → Broker 2
Partition 0 Replica (Follower) → Broker 3
If Broker 1 fails?
① Broker 1 down 🔴
② Broker 2 immediately promoted to Leader ✓
③ Service continues without interruption
④ New replica created on Broker 3
4. Why Kafka is Fast – Key Secrets
Secret 1: Sequential Disk Usage
Random writes (slow):
Jumping around on disk
Seek → Write → Seek → Write
Takes a long time
Sequential writes (fast):
Writing continuously on disk
Write → Write → Write → Write
Even HDDs perform like SSDs!
Kafka only appends messages to file ends. That’s why it’s fast.
Secret 2: Batch Processing
Individual sends (inefficient):
1 message → Send (network round trip)
1 message → Send (network round trip)
1 message → Send (network round trip)
Total: 3 network round trips
Batch sends (efficient):
Collect 100 messages → Send once
1 network round trip processes 100!
Secret 3: Zero-Copy
Traditional approach:
Disk → Kernel memory → App memory → Socket buffer → Network
(4 copies)
Kafka’s Zero-Copy:
Disk → Network
(Minimal copying)
Uses almost no CPU and memory to transfer data.
Secret 4: Partition Parallelism
1 partition = 1 person working → Slow
10 partitions = 10 people working simultaneously → 10x faster
5. Kafka 4.0 Innovations – Easier and More Powerful
No More ZooKeeper! (KRaft Mode)
Past (complicated era):
① Install ZooKeeper cluster (3 servers)
② Install Kafka cluster (3 servers)
③ Configure ZooKeeper-Kafka connection
④ Manage and monitor both
Now (simplified era):
① Just install Kafka!
② Much simpler configuration
③ Half the systems to manage
KRaft advantages:
[Comparison]
ZooKeeper era:
- Max partitions: 100,000
- Cluster start time: Minutes
- Management complexity: High
KRaft era:
- Max partitions: Millions
- Cluster start time: Seconds
- Management complexity: Low
Much Smoother Rebalancing (KIP-848)
Past problem:
① Add new Consumer
② All Consumers stop 🛑
③ Redistribute partitions
④ Restart all Consumers
⑤ 10-second processing halt...
Now it’s like this:
① Add new Consumer
② Only needed partitions gradually move
③ Other Consumers continue processing ✓
④ Only ~100ms delay
Real experience:
- Past: “Huh? Processing suddenly stopped?”
- Now: “When did the Consumer get added?” (Can’t even notice)
Queue Support! (Share Group)
Traditional Kafka:
Partition 0 → Only Consumer A can read
Partition 1 → Only Consumer B can read
Share Group (new feature):
Partition 0 → Consumers A, B, C all collaborate
(One message per consumer, like RabbitMQ!)
When is this useful?
[Ride-hailing System]
100 ride requests in partition 0
→ Matching servers A, B, C collaborate
→ A handles 30, B handles 35, C handles 35
→ Fast matching through parallelism!
6. Hands-On Practice – Step-by-Step Guide
Prerequisites: Java 17
# Ubuntu/Debian
sudo apt-get update
sudo apt-get install openjdk-17-jdk
# CentOS/RHEL
sudo yum install java-17-openjdk-devel
# macOS (Homebrew)
brew install openjdk@17
# Verify
java -version
# Should show "openjdk version 17.0.x"
Download Kafka
# Download latest version
wget https://downloads.apache.org/kafka/4.1.0/kafka_2.13-4.1.0.tgz
# Extract
tar -xzf kafka_2.13-4.1.0.tgz
cd kafka_2.13-4.1.0
Start Kafka (Really Simple!)
# Step 1: Generate ID (first time only)
KAFKA_CLUSTER_ID="$(bin/kafka-storage.sh random-uuid)"
# Step 2: Prepare storage
bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c config/kraft/server.properties
# Step 3: Start Kafka!
bin/kafka-server-start.sh config/kraft/server.properties
When you see this, it’s successful:
[KafkaServer id=1] started
Create a Topic – Creating a Mailbox
Open a new terminal:
bin/kafka-topics.sh --create \
--topic hello-kafka \
--bootstrap-server localhost:9092 \
--partitions 3 \
--replication-factor 1
What does this mean?
hello-kafka: Mailbox namepartitions 3: 3 windows (3x faster processing)replication-factor 1: 1 replica (since we’re practicing alone)
Send Messages – Writing Your First Letter
bin/kafka-console-producer.sh \
--topic hello-kafka \
--bootstrap-server localhost:9092
Now you can type:
> Hello Kafka!
> My first message
> Will it arrive in real-time?
Type line by line and press Enter! Each line becomes a message.
Receive Messages – Reading Letters
Open another terminal:
bin/kafka-console-consumer.sh \
--topic hello-kafka \
--from-beginning \
--bootstrap-server localhost:9092
Your sent messages appear on screen!
Hello Kafka!
My first message
Will it arrive in real-time?
Real-Time Test
Place Producer and Consumer terminals side by side:
Type in Producer:
> Real-time test!
Almost instantly appears in Consumer:
Real-time test!
Amazing, right? This is Kafka’s real-time processing capability!
7. Even Easier with Docker
If installation seems tedious, try Docker.
Create docker-compose.yml
version: '3.8'
services:
kafka:
image: apache/kafka:4.0.0
container_name: my-kafka
ports:
- "9092:9092"
environment:
# KRaft mode configuration
KAFKA_NODE_ID: 1
KAFKA_PROCESS_ROLES: broker,controller
KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
KAFKA_CONTROLLER_QUORUM_VOTERS: 1@localhost:9093
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
KAFKA_LOG_DIRS: /tmp/kraft-combined-logs
CLUSTER_ID: MkU3OEVBNTcwNTJENDM2Qk
Run
# Start Kafka
docker-compose up -d
# Check if running
docker-compose logs -f
# Create topic
docker exec -it my-kafka kafka-topics.sh \
--create --topic test \
--bootstrap-server localhost:9092 \
--partitions 3
# Stop
docker-compose down
That’s it! Simple, right?
8. Using with Code
Sending Messages with Python
from kafka import KafkaProducer
import json
from datetime import datetime
# Create Producer
producer = KafkaProducer(
bootstrap_servers=['localhost:9092'],
value_serializer=lambda v: json.dumps(v).encode('utf-8')
)
# Send order data
order = {
'order_id': 'ORD-001',
'product': 'Laptop',
'quantity': 1,
'price': 1500,
'timestamp': datetime.now().isoformat()
}
# Send to Kafka!
future = producer.send('orders', value=order)
# Confirm delivery
result = future.get(timeout=10)
print(f'Sent! Partition: {result.partition}, Offset: {result.offset}')
producer.close()
Output:
Sent! Partition: 2, Offset: 15
Receiving Messages with Python
from kafka import KafkaConsumer
import json
# Create Consumer
consumer = KafkaConsumer(
'orders',
bootstrap_servers=['localhost:9092'],
auto_offset_reset='earliest', # Read from beginning
group_id='order-processor', # Group name
value_deserializer=lambda x: json.loads(x.decode('utf-8'))
)
print('Waiting for orders...')
# Receive and process messages
for message in consumer:
order = message.value
print(f'\nNew order arrived!')
print(f'Order ID: {order["order_id"]}')
print(f'Product: {order["product"]}')
print(f'Quantity: {order["quantity"]}')
print(f'Price: ${order["price"]:,}')
Output:
Waiting for orders...
New order arrived!
Order ID: ORD-001
Product: Laptop
Quantity: 1
Price: $1,500
Java Simple Example
Producer:
import org.apache.kafka.clients.producer.*;
import java.util.Properties;
public class OrderProducer {
public static void main(String[] args) {
// Configuration
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
// Create Producer
Producer<String, String> producer = new KafkaProducer<>(props);
// Send message
String orderId = "ORD-001";
String orderData = "Laptop,1,1500";
ProducerRecord<String, String> record =
new ProducerRecord<>("orders", orderId, orderData);
producer.send(record, (metadata, exception) -> {
if (exception == null) {
System.out.println("Sent successfully! Partition: " +
metadata.partition() + ", Offset: " + metadata.offset());
} else {
System.out.println("Send failed: " + exception.getMessage());
}
});
producer.close();
}
}
9. Real-World Use Cases
Case 1: Real-Time Order Processing for Food Delivery
[Customer orders pizza]
① Order App → Kafka (order.created)
"Pizza order from Downtown"
② Multiple systems read simultaneously from Kafka:
Matching System:
"Found 3 nearby restaurants!"
Payment System:
"Card payment completed!"
Notification System:
"Sent 'Order received' to customer"
"Sent 'New order!' to restaurant"
Real-time Map:
"Display on delivery tracking map"
Why use Kafka?
- Handles thousands of orders per second
- One slow system doesn’t affect others
- Stable even during peak hours (6-8 PM)
Case 2: Netflix Recommendation System
[Every moment while watching a show]
User actions:
- Watch 5-second preview → Kafka
- Start playing → Kafka
- Watch for 10 minutes → Kafka
- Pause → Kafka
- Click continue watching → Kafka
Real-time processing:
- Recommendation model updates immediately
- "Try these shows" refreshed
- Viewing pattern analyzed
- Next episode preloaded
Case 3: Amazon Inventory Management
[Real-time inventory sync]
Warehouse A: "10 laptops left" → Kafka
Warehouse B: "5 laptops left" → Kafka
Warehouse C: "0 laptops (out of stock)" → Kafka
→ Real-time update on website
→ Real-time update on mobile app
→ Reflected in search results
→ Calculate available quantity
Benefits:
- Prevent out-of-stock orders
- Prevent excess inventory
- Improved customer satisfaction
Case 4: Digital Bank Transfer Processing
[Transferring $1,000]
① Transfer request → Kafka
② Multiple systems process simultaneously:
Balance check: "Sufficient balance?"
Limit check: "Within daily limit?"
Fraud detection: "Suspicious transaction?"
Execute transfer: "Transfer complete!"
Send notification: "$1,000 transferred"
Accounting: "Record transaction"
10. Frequently Asked Questions
Q1. How many partitions should I set?
Simple calculation:
If target throughput is 10,000 messages/second?
1. Measure one Consumer's processing speed
→ Example: 2,000 messages/second
2. Calculate needed Consumers
→ 10,000 ÷ 2,000 = 5 Consumers
3. Partitions = Consumers
→ Set 5 partitions!
4. Add buffer (1.5x)
→ Final: 7-8 partitions
Recommended by scale:
- Test/Development: 3
- Small service: 5-10
- Medium service: 20-50
- Large service: 100+
Q2. How long should I retain messages?
Guide by use case:
Real-time notifications:
Retention 1 day
(Day-old notifications are meaningless)
Order data:
Retention 30 days
(For refunds, exchanges)
Log data:
Retention 7 days
(Discard after analysis)
Critical events:
Retention unlimited
(For reprocessing or auditing)
Configuration:
# Retain for 7 days
bin/kafka-configs.sh --alter \
--entity-type topics \
--entity-name my-topic \
--add-config retention.ms=604800000 \
--bootstrap-server localhost:9092
Q3. What is Consumer Lag?
Simple explanation:
[Situation]
Producer: Sent 100 messages
Consumer: Read 70 messages
→ Lag = 30 (30 messages still need processing)
Analogy:
Restaurant kitchen:
- 10 orders came in (Producer)
- Chef made 7 (Consumer)
- 3 orders waiting (Lag)
If Lag keeps increasing?
→ Orders are backing up!
→ Need more chefs (add Consumers)
Check:
bin/kafka-consumer-groups.sh \
--bootstrap-server localhost:9092 \
--group my-group \
--describe
Result:
TOPIC PARTITION LAG
orders 0 0 (Good!)
orders 1 150 (Backlog!)
orders 2 5 (OK)
Q4. How should I set Replication Factor?
Recommended by scenario:
Development/Testing:
Replication Factor = 1
(No backup needed for solo testing)
Small Production:
Replication Factor = 2
(Can survive one failure)
Critical Production:
Replication Factor = 3
(Survives two simultaneous failures, recommended!)
Mission Critical:
Replication Factor = 5
(Financial sector, data cannot be lost)
Configuration:
bin/kafka-topics.sh --create \
--topic important-data \
--partitions 10 \
--replication-factor 3 \
--bootstrap-server localhost:9092
Q5. Kafka vs Redis – Which should I use?
Choose by use case:
Redis (Pub/Sub):
✓ No message retention (volatile)
✓ Ultra-low latency (microseconds)
✓ Simple notifications
✗ Possible message loss
Example: Chat read status, real-time alerts
Kafka:
✓ Persistent message storage
✓ High-volume processing
✓ Multiple Consumers
✓ No data loss
✗ Slightly heavier
Example: Order processing, log collection, event sourcing
Using together:
Order system:
Order data → Kafka (persistent storage)
Real-time notifications → Redis (fast push)
11. Troubleshooting
Problem 1: “Connection refused” error
Symptoms:
Error connecting to node localhost:9092
Cause and Solution:
# 1. Check if Kafka is running
ps aux | grep kafka
# If not running, start it
bin/kafka-server-start.sh config/kraft/server.properties
# 2. Check if port is in use
lsof -i :9092
# If another process is using it, stop it
kill -9 <PID>
Problem 2: Consumer Lag keeps increasing
Cause:
Producer: 10,000 messages/second
Consumer: 3,000 messages/second
→ 7,000 messages/second backlog!
Solution 1: Add Consumers
# Original: 2 Consumers
# Change: Increase to 5 Consumers
# But check if partitions are sufficient!
bin/kafka-topics.sh --describe --topic my-topic
# 3 partitions means 5 Consumers is pointless
# Increase partitions first
bin/kafka-topics.sh --alter \
--topic my-topic \
--partitions 10
Solution 2: Improve Consumer speed
# Before: Save to DB for each message (slow)
for message in consumer:
save_to_db(message) # DB access each time
# After: Batch save (fast)
batch = []
for message in consumer:
batch.append(message)
if len(batch) >= 100:
save_batch_to_db(batch) # 100 at once
batch = []
Problem 3: Out of memory error
Symptoms:
java.lang.OutOfMemoryError: Java heap space
Solution:
# Increase Kafka heap memory
export KAFKA_HEAP_OPTS="-Xmx2G -Xms2G"
# Restart
bin/kafka-server-start.sh config/kraft/server.properties
Problem 4: Disk space full
Check:
df -h
Solution:
# Reduce retention (30 days → 7 days)
bin/kafka-configs.sh --alter \
--entity-type topics \
--entity-name my-topic \
--add-config retention.ms=604800000
# Or set size limit
bin/kafka-configs.sh --alter \
--entity-type topics \
--entity-name my-topic \
--add-config retention.bytes=10737418240
12. Next Steps
Free Learning Resources
Official Documentation:
Free Courses:
- Kafka 101 – Confluent
- Apache Kafka for Beginners – YouTube
Free Practice:
- Confluent Cloud Free Trial ($400 credits)
Management Tools
Kafka UI (Free):
docker run -p 8080:8080 \
-e KAFKA_CLUSTERS_0_NAME=local \
-e KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS=localhost:9092 \
provectuslabs/kafka-ui:latest
Access via browser at http://localhost:8080:
- View topic list
- Check messages
- Monitor Consumer Group status
- Visual management!
Final Thoughts…
Kafka looks difficult at first, but once you understand the core concepts, it’s an incredibly powerful tool.
Key takeaways:
- Kafka = Express postal service
- Delivers many letters quickly
- Multiple people can read the same letter
- Partition = Service window
- More = faster
- Should match Consumer count
- Offset = Bookmark
- Remembers reading position
- Can resume after failure
- Replication = Backup
- Stores data safely
- No problem if server fails
Kafka 4.0 removed ZooKeeper, making it much easier. Now is a great time to start!
Begin with small projects. Start with log collection or simple event processing, and as you get comfortable, you can apply it to larger systems. Good luck! 🙂