“Our service crashed when traffic suddenly spiked…” “An order came in but it wasn’t reflected in the inventory system…” “There are too many logs to monitor in real-time…” Sound familiar? Apache Kafka is the tool that solves these exact problems. Let’s explore it step by step.

 

 

 

1. What is Kafka? – Think of it as an Express Postal Service

Simply put, Kafka is like an ultra-fast postal service.

Regular postal service works like this

Sender → Post Office → Receiver

One person sends a letter, one person receives it. Once the receiver reads it, it’s done.

The Kafka postal service is different

Multiple Senders → Kafka → Multiple Receivers
                  (Storage)  (Each gets a copy)

What makes it special:

  1. Can handle millions of letters per day (incredible speed)
  2. Keeps letters for a set period so you can read them later
  3. Multiple people can read the same letter independently
  4. Creates backup copies in multiple locations to prevent loss

This is exactly what Kafka does!

Understanding with a real example

The moment you click “Order” on Amazon:

① Order System: "New order received!" → Sends message to Kafka

② Kafka simultaneously delivers this message to multiple systems:
   - Inventory System: "Need to deduct 1 item"
   - Payment System: "Process card payment"
   - Shipping System: "Prepare for shipment"
   - Notification System: "Send confirmation email"
   - Analytics System: "Add to today's sales"

Without Kafka:

  • Order system must connect to all 5 systems individually
  • If payment system is down? Order system stops too
  • Adding a recommendation system later? Must modify order system code

With Kafka:

  • Order system just sends to Kafka
  • Each system operates independently
  • Adding new systems? Just read from Kafka

 

 

2. Why Was Kafka Created? – LinkedIn’s Challenge

In 2011, LinkedIn had a major problem.

Tangled Systems

[System Architecture]

User Registration ─┬→ Email System
                  ├→ Recommendation System
                  ├→ Search System
                  └→ Analytics System

Email System ──┬→ Notification System
              └→ Log System

Recommendation ┬→ Analytics System
              └→ Search System

Problems:

  • 20 systems = hundreds of connections
  • One system fails → must check all connected systems
  • Adding new systems → connect to all existing systems
  • Processing 1.4 billion messages daily → systems struggling

Simplification with Kafka

[After Kafka]

User Registration ─┐
Email System ──────┤
Recommendation ────┤→ Kafka ─┤→ Notification System
Search System ─────┤         ├→ Log System
Analytics System ──┘         ├→ Search System
                            └→ Recommendation System

Results:

  • All systems only communicate with Kafka
  • One system’s failure doesn’t affect others
  • Adding new systems is simple
  • Handles millions of messages per second easily

 

 

3. Core Concepts – Real-Life Analogies

Producer = Letter Sender

Applications that create data and send it to Kafka.

Real examples:

  • Web server: “User viewed product page”
  • Mobile app: “User opened app”
  • IoT sensor: “Current temperature is 77°F”
  • Order system: “New order received”

Consumer = Letter Receiver

Applications that read and process data from Kafka.

Real examples:

  • Recommendation system: Reads click data → Updates recommendation algorithm
  • Notification system: Reads order data → Sends customer SMS
  • Monitoring system: Reads error logs → Alerts on-call engineer

Topic = Mailbox Label

Categories that organize messages by type. Like bookshelf labels in a library.

[Kafka Topics]

📬 user.signup      (User registration events)
📬 order.created    (Order creation events)
📬 payment.success  (Payment completion events)
📬 sensor.temp      (Temperature sensor data)

Each system subscribes only to the topics it needs.

Partition = Multiple Service Windows

Topics split into multiple pieces. Think of bank teller windows.

[Bank Analogy]

1 Window:
100 customers → 1 window → Takes 100 minutes

5 Windows:
100 customers → 5 windows → Takes 20 minutes

Kafka works the same way:

[Order Topic - 1 Partition]
Orders 1, 2, 3, 4, 5... → Partition 0 → Slow processing

[Order Topic - 3 Partitions]
Orders 1, 4, 7... → Partition 0 ↘
Orders 2, 5, 8... → Partition 1 → Parallel → 3x faster!
Orders 3, 6, 9... → Partition 2 ↗

How are messages distributed?

  1. With key: Same key always goes to same partition
    All user123 activities → Always partition 0
    All user456 activities → Always partition 1
    

    Use this when order matters.

  2. Without key: Evenly distributed
    Message 1 → Partition 0
    Message 2 → Partition 1
    Message 3 → Partition 2
    Message 4 → Partition 0
    

Offset = Page Number

A number indicating each message’s position.

[Partition 0 Messages]

Offset:  0       1       2       3       4       5
Message: [Ord1]  [Ord4]  [Ord7]  [Ord10] [Ord13] [Ord16]
                               ↑
                         Read up to here (offset 3)

Why is this important?

Like bookmarking your place:

  • “Where did I stop reading?” → Check offset
  • System restarts → Continue from last offset
  • Made a mistake? → Rewind offset and reread

Broker = Post Office Branch

The actual Kafka server that stores and manages messages.

[Single Broker - Risky]
1 broker → Complete failure if it crashes ❌

[Cluster - Safe]
Broker 1 (stores messages A, B)
Broker 2 (copy of A, B)
Broker 3 (copy of A, B)
→ Service continues even if 1-2 fail ✓

Consumer Group = Collaborative Team

Multiple consumers forming a team to divide and process work.

[Order Processing Team - 4 Members]

Order Topic (4 partitions)
├─ Partition 0 → Employee A processes
├─ Partition 1 → Employee B processes
├─ Partition 2 → Employee C processes
└─ Partition 3 → Employee D processes

Result: 4x faster processing!

Important rules:

  • One partition can only be processed by one consumer in a group
  • 5 employees but 4 partitions? → 1 employee waits
  • 2 employees but 4 partitions? → Each handles 2

Different groups are independent:

[Each team uses order data independently]

Order Topic
├─ Shipping Team: Prepares shipping addresses
├─ Analytics Team: Calculates sales
└─ Notification Team: Sends customer messages

(Each team processes independently)

Replication = Backup

Data copied across multiple brokers for safe storage.

[Replication Factor = 3]

Partition 0 Original (Leader)    → Broker 1
Partition 0 Replica (Follower) → Broker 2
Partition 0 Replica (Follower) → Broker 3

If Broker 1 fails?

① Broker 1 down 🔴
② Broker 2 immediately promoted to Leader ✓
③ Service continues without interruption
④ New replica created on Broker 3

 

 

4. Why Kafka is Fast – Key Secrets

Secret 1: Sequential Disk Usage

Random writes (slow):

Jumping around on disk
Seek → Write → Seek → Write
Takes a long time

Sequential writes (fast):

Writing continuously on disk
Write → Write → Write → Write
Even HDDs perform like SSDs!

Kafka only appends messages to file ends. That’s why it’s fast.

Secret 2: Batch Processing

Individual sends (inefficient):

1 message → Send (network round trip)
1 message → Send (network round trip)
1 message → Send (network round trip)
Total: 3 network round trips

Batch sends (efficient):

Collect 100 messages → Send once
1 network round trip processes 100!

Secret 3: Zero-Copy

Traditional approach:

Disk → Kernel memory → App memory → Socket buffer → Network
(4 copies)

Kafka’s Zero-Copy:

Disk → Network
(Minimal copying)

Uses almost no CPU and memory to transfer data.

Secret 4: Partition Parallelism

1 partition = 1 person working → Slow
10 partitions = 10 people working simultaneously → 10x faster

 

 

5. Kafka 4.0 Innovations – Easier and More Powerful

No More ZooKeeper! (KRaft Mode)

Past (complicated era):

① Install ZooKeeper cluster (3 servers)
② Install Kafka cluster (3 servers)
③ Configure ZooKeeper-Kafka connection
④ Manage and monitor both

Now (simplified era):

① Just install Kafka!
② Much simpler configuration
③ Half the systems to manage

KRaft advantages:

[Comparison]

ZooKeeper era:
- Max partitions: 100,000
- Cluster start time: Minutes
- Management complexity: High

KRaft era:
- Max partitions: Millions
- Cluster start time: Seconds
- Management complexity: Low

Much Smoother Rebalancing (KIP-848)

Past problem:

① Add new Consumer
② All Consumers stop 🛑
③ Redistribute partitions
④ Restart all Consumers
⑤ 10-second processing halt...

Now it’s like this:

① Add new Consumer
② Only needed partitions gradually move
③ Other Consumers continue processing ✓
④ Only ~100ms delay

Real experience:

  • Past: “Huh? Processing suddenly stopped?”
  • Now: “When did the Consumer get added?” (Can’t even notice)

Queue Support! (Share Group)

Traditional Kafka:

Partition 0 → Only Consumer A can read
Partition 1 → Only Consumer B can read

Share Group (new feature):

Partition 0 → Consumers A, B, C all collaborate
(One message per consumer, like RabbitMQ!)

When is this useful?

[Ride-hailing System]

100 ride requests in partition 0
→ Matching servers A, B, C collaborate
→ A handles 30, B handles 35, C handles 35
→ Fast matching through parallelism!

 

 

6. Hands-On Practice – Step-by-Step Guide

Prerequisites: Java 17

# Ubuntu/Debian
sudo apt-get update
sudo apt-get install openjdk-17-jdk

# CentOS/RHEL
sudo yum install java-17-openjdk-devel

# macOS (Homebrew)
brew install openjdk@17

# Verify
java -version
# Should show "openjdk version 17.0.x"

Download Kafka

# Download latest version
wget https://downloads.apache.org/kafka/4.1.0/kafka_2.13-4.1.0.tgz

# Extract
tar -xzf kafka_2.13-4.1.0.tgz
cd kafka_2.13-4.1.0

Start Kafka (Really Simple!)

# Step 1: Generate ID (first time only)
KAFKA_CLUSTER_ID="$(bin/kafka-storage.sh random-uuid)"

# Step 2: Prepare storage
bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c config/kraft/server.properties

# Step 3: Start Kafka!
bin/kafka-server-start.sh config/kraft/server.properties

When you see this, it’s successful:

[KafkaServer id=1] started

Create a Topic – Creating a Mailbox

Open a new terminal:

bin/kafka-topics.sh --create \
  --topic hello-kafka \
  --bootstrap-server localhost:9092 \
  --partitions 3 \
  --replication-factor 1

What does this mean?

  • hello-kafka: Mailbox name
  • partitions 3: 3 windows (3x faster processing)
  • replication-factor 1: 1 replica (since we’re practicing alone)

Send Messages – Writing Your First Letter

bin/kafka-console-producer.sh \
  --topic hello-kafka \
  --bootstrap-server localhost:9092

Now you can type:

> Hello Kafka!
> My first message
> Will it arrive in real-time?

Type line by line and press Enter! Each line becomes a message.

Receive Messages – Reading Letters

Open another terminal:

bin/kafka-console-consumer.sh \
  --topic hello-kafka \
  --from-beginning \
  --bootstrap-server localhost:9092

Your sent messages appear on screen!

Hello Kafka!
My first message
Will it arrive in real-time?

Real-Time Test

Place Producer and Consumer terminals side by side:

Type in Producer:

> Real-time test!

Almost instantly appears in Consumer:

Real-time test!

Amazing, right? This is Kafka’s real-time processing capability!

 

 

7. Even Easier with Docker

If installation seems tedious, try Docker.

Create docker-compose.yml

version: '3.8'
services:
  kafka:
    image: apache/kafka:4.0.0
    container_name: my-kafka
    ports:
      - "9092:9092"
    environment:
      # KRaft mode configuration
      KAFKA_NODE_ID: 1
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@localhost:9093
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
      KAFKA_LOG_DIRS: /tmp/kraft-combined-logs
      CLUSTER_ID: MkU3OEVBNTcwNTJENDM2Qk

Run

# Start Kafka
docker-compose up -d

# Check if running
docker-compose logs -f

# Create topic
docker exec -it my-kafka kafka-topics.sh \
  --create --topic test \
  --bootstrap-server localhost:9092 \
  --partitions 3

# Stop
docker-compose down

That’s it! Simple, right?

 

 

8. Using with Code

Sending Messages with Python

from kafka import KafkaProducer
import json
from datetime import datetime

# Create Producer
producer = KafkaProducer(
    bootstrap_servers=['localhost:9092'],
    value_serializer=lambda v: json.dumps(v).encode('utf-8')
)

# Send order data
order = {
    'order_id': 'ORD-001',
    'product': 'Laptop',
    'quantity': 1,
    'price': 1500,
    'timestamp': datetime.now().isoformat()
}

# Send to Kafka!
future = producer.send('orders', value=order)

# Confirm delivery
result = future.get(timeout=10)
print(f'Sent! Partition: {result.partition}, Offset: {result.offset}')

producer.close()

Output:

Sent! Partition: 2, Offset: 15

Receiving Messages with Python

from kafka import KafkaConsumer
import json

# Create Consumer
consumer = KafkaConsumer(
    'orders',
    bootstrap_servers=['localhost:9092'],
    auto_offset_reset='earliest',  # Read from beginning
    group_id='order-processor',    # Group name
    value_deserializer=lambda x: json.loads(x.decode('utf-8'))
)

print('Waiting for orders...')

# Receive and process messages
for message in consumer:
    order = message.value
    print(f'\nNew order arrived!')
    print(f'Order ID: {order["order_id"]}')
    print(f'Product: {order["product"]}')
    print(f'Quantity: {order["quantity"]}')
    print(f'Price: ${order["price"]:,}')

Output:

Waiting for orders...

New order arrived!
Order ID: ORD-001
Product: Laptop
Quantity: 1
Price: $1,500

Java Simple Example

Producer:

import org.apache.kafka.clients.producer.*;
import java.util.Properties;

public class OrderProducer {
    public static void main(String[] args) {
        // Configuration
        Properties props = new Properties();
        props.put("bootstrap.servers", "localhost:9092");
        props.put("key.serializer", 
            "org.apache.kafka.common.serialization.StringSerializer");
        props.put("value.serializer", 
            "org.apache.kafka.common.serialization.StringSerializer");
        
        // Create Producer
        Producer<String, String> producer = new KafkaProducer<>(props);
        
        // Send message
        String orderId = "ORD-001";
        String orderData = "Laptop,1,1500";
        
        ProducerRecord<String, String> record = 
            new ProducerRecord<>("orders", orderId, orderData);
        
        producer.send(record, (metadata, exception) -> {
            if (exception == null) {
                System.out.println("Sent successfully! Partition: " + 
                    metadata.partition() + ", Offset: " + metadata.offset());
            } else {
                System.out.println("Send failed: " + exception.getMessage());
            }
        });
        
        producer.close();
    }
}

 

 

9. Real-World Use Cases

Case 1: Real-Time Order Processing for Food Delivery

[Customer orders pizza]

① Order App → Kafka (order.created)
   "Pizza order from Downtown"

② Multiple systems read simultaneously from Kafka:

   Matching System:
   "Found 3 nearby restaurants!"
   
   Payment System:
   "Card payment completed!"
   
   Notification System:
   "Sent 'Order received' to customer"
   "Sent 'New order!' to restaurant"
   
   Real-time Map:
   "Display on delivery tracking map"

Why use Kafka?

  • Handles thousands of orders per second
  • One slow system doesn’t affect others
  • Stable even during peak hours (6-8 PM)

Case 2: Netflix Recommendation System

[Every moment while watching a show]

User actions:
- Watch 5-second preview → Kafka
- Start playing → Kafka
- Watch for 10 minutes → Kafka
- Pause → Kafka
- Click continue watching → Kafka

Real-time processing:
- Recommendation model updates immediately
- "Try these shows" refreshed
- Viewing pattern analyzed
- Next episode preloaded

Case 3: Amazon Inventory Management

[Real-time inventory sync]

Warehouse A: "10 laptops left" → Kafka
Warehouse B: "5 laptops left" → Kafka
Warehouse C: "0 laptops (out of stock)" → Kafka

→ Real-time update on website
→ Real-time update on mobile app
→ Reflected in search results
→ Calculate available quantity

Benefits:

  • Prevent out-of-stock orders
  • Prevent excess inventory
  • Improved customer satisfaction

Case 4: Digital Bank Transfer Processing

[Transferring $1,000]

① Transfer request → Kafka
② Multiple systems process simultaneously:

   Balance check: "Sufficient balance?"
   Limit check: "Within daily limit?"
   Fraud detection: "Suspicious transaction?"
   Execute transfer: "Transfer complete!"
   Send notification: "$1,000 transferred"
   Accounting: "Record transaction"

 

 

10. Frequently Asked Questions

Q1. How many partitions should I set?

Simple calculation:

If target throughput is 10,000 messages/second?

1. Measure one Consumer's processing speed
   → Example: 2,000 messages/second

2. Calculate needed Consumers
   → 10,000 ÷ 2,000 = 5 Consumers

3. Partitions = Consumers
   → Set 5 partitions!

4. Add buffer (1.5x)
   → Final: 7-8 partitions

Recommended by scale:

  • Test/Development: 3
  • Small service: 5-10
  • Medium service: 20-50
  • Large service: 100+

Q2. How long should I retain messages?

Guide by use case:

Real-time notifications:
Retention 1 day
(Day-old notifications are meaningless)

Order data:
Retention 30 days
(For refunds, exchanges)

Log data:
Retention 7 days
(Discard after analysis)

Critical events:
Retention unlimited
(For reprocessing or auditing)

Configuration:

# Retain for 7 days
bin/kafka-configs.sh --alter \
  --entity-type topics \
  --entity-name my-topic \
  --add-config retention.ms=604800000 \
  --bootstrap-server localhost:9092

Q3. What is Consumer Lag?

Simple explanation:

[Situation]

Producer: Sent 100 messages
Consumer: Read 70 messages

→ Lag = 30 (30 messages still need processing)

Analogy:

Restaurant kitchen:
- 10 orders came in (Producer)
- Chef made 7 (Consumer)
- 3 orders waiting (Lag)

If Lag keeps increasing?
→ Orders are backing up!
→ Need more chefs (add Consumers)

Check:

bin/kafka-consumer-groups.sh \
  --bootstrap-server localhost:9092 \
  --group my-group \
  --describe

Result:

TOPIC     PARTITION  LAG
orders    0          0      (Good!)
orders    1          150    (Backlog!)
orders    2          5      (OK)

Q4. How should I set Replication Factor?

Recommended by scenario:

Development/Testing:
Replication Factor = 1
(No backup needed for solo testing)

Small Production:
Replication Factor = 2
(Can survive one failure)

Critical Production:
Replication Factor = 3
(Survives two simultaneous failures, recommended!)

Mission Critical:
Replication Factor = 5
(Financial sector, data cannot be lost)

Configuration:

bin/kafka-topics.sh --create \
  --topic important-data \
  --partitions 10 \
  --replication-factor 3 \
  --bootstrap-server localhost:9092

Q5. Kafka vs Redis – Which should I use?

Choose by use case:

Redis (Pub/Sub):
✓ No message retention (volatile)
✓ Ultra-low latency (microseconds)
✓ Simple notifications
✗ Possible message loss
Example: Chat read status, real-time alerts

Kafka:
✓ Persistent message storage
✓ High-volume processing
✓ Multiple Consumers
✓ No data loss
✗ Slightly heavier
Example: Order processing, log collection, event sourcing

Using together:

Order system:
Order data → Kafka (persistent storage)
Real-time notifications → Redis (fast push)

 

 

11. Troubleshooting

Problem 1: “Connection refused” error

Symptoms:

Error connecting to node localhost:9092

Cause and Solution:

# 1. Check if Kafka is running
ps aux | grep kafka

# If not running, start it
bin/kafka-server-start.sh config/kraft/server.properties

# 2. Check if port is in use
lsof -i :9092

# If another process is using it, stop it
kill -9 <PID>

Problem 2: Consumer Lag keeps increasing

Cause:

Producer: 10,000 messages/second
Consumer: 3,000 messages/second
→ 7,000 messages/second backlog!

Solution 1: Add Consumers

# Original: 2 Consumers
# Change: Increase to 5 Consumers

# But check if partitions are sufficient!
bin/kafka-topics.sh --describe --topic my-topic

# 3 partitions means 5 Consumers is pointless
# Increase partitions first
bin/kafka-topics.sh --alter \
  --topic my-topic \
  --partitions 10

Solution 2: Improve Consumer speed

# Before: Save to DB for each message (slow)
for message in consumer:
    save_to_db(message)  # DB access each time

# After: Batch save (fast)
batch = []
for message in consumer:
    batch.append(message)
    if len(batch) >= 100:
        save_batch_to_db(batch)  # 100 at once
        batch = []

Problem 3: Out of memory error

Symptoms:

java.lang.OutOfMemoryError: Java heap space

Solution:

# Increase Kafka heap memory
export KAFKA_HEAP_OPTS="-Xmx2G -Xms2G"

# Restart
bin/kafka-server-start.sh config/kraft/server.properties

Problem 4: Disk space full

Check:

df -h

Solution:

# Reduce retention (30 days → 7 days)
bin/kafka-configs.sh --alter \
  --entity-type topics \
  --entity-name my-topic \
  --add-config retention.ms=604800000

# Or set size limit
bin/kafka-configs.sh --alter \
  --entity-type topics \
  --entity-name my-topic \
  --add-config retention.bytes=10737418240

 

 

12. Next Steps

Free Learning Resources

Official Documentation:

Free Courses:

Free Practice:

Management Tools

Kafka UI (Free):

docker run -p 8080:8080 \
  -e KAFKA_CLUSTERS_0_NAME=local \
  -e KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS=localhost:9092 \
  provectuslabs/kafka-ui:latest

Access via browser at http://localhost:8080:

  • View topic list
  • Check messages
  • Monitor Consumer Group status
  • Visual management!

 

 

Final Thoughts…

Kafka looks difficult at first, but once you understand the core concepts, it’s an incredibly powerful tool.

Key takeaways:

  1. Kafka = Express postal service
    • Delivers many letters quickly
    • Multiple people can read the same letter
  2. Partition = Service window
    • More = faster
    • Should match Consumer count
  3. Offset = Bookmark
    • Remembers reading position
    • Can resume after failure
  4. Replication = Backup
    • Stores data safely
    • No problem if server fails

Kafka 4.0 removed ZooKeeper, making it much easier. Now is a great time to start!

Begin with small projects. Start with log collection or simple event processing, and as you get comfortable, you can apply it to larger systems. Good luck! 🙂

 

 

 

Leave a Reply