Designing a Scalable Chat System: WhatsApp/Discord Architecture Deep Dive

• 40 min read
System Design

Designing a chat system that handles millions of concurrent users, delivers messages in real-time, and maintains high availability requires careful engineering decisions at every layer. This comprehensive guide explores the architecture, trade-offs, and design patterns needed to build a scalable chat system like WhatsApp or Discord.

Requirements Analysis

Functional Requirements

Core Messaging:

  • One-on-one messaging between users
  • Group messaging (multiple participants)
  • Media sharing (images, videos, files)
  • Message status (sent, delivered, read)
  • Typing indicators
  • Message search and history

User Management:

  • User registration and authentication
  • Contact management
  • Presence status (online, offline, away)
  • User profiles and settings

Advanced Features:

  • Message reactions and replies
  • File attachments
  • Voice and video calls (optional)
  • End-to-end encryption
  • Message deletion and editing

Non-Functional Requirements

Scalability: Support 500 million daily active users, 50 billion messages per day Availability: 99.9% uptime Latency: Message delivery < 100ms p99 Durability: Messages stored permanently, no data loss Consistency: Eventually consistent (acceptable message ordering delays)

Capacity Estimation

Traffic Estimates

Daily Active Users (DAU): 500 million Peak Concurrent Users: 10% of DAU = 50 million Average Messages per User: 50 messages/day Total Messages per Day: 500M × 50 = 25 billion messages/day Peak Messages per Second: 25B / (24 × 3600) × 3 (peak factor) = ~870K messages/sec

Storage Estimates

Average Message Size: 100 bytes (text) + 1KB metadata = 1.1KB Daily Message Storage: 25B × 1.1KB = 27.5 TB/day Annual Storage: 27.5 TB × 365 = ~10 PB/year Media Messages: 20% of messages are media, average 200KB Daily Media Storage: 25B × 0.2 × 200KB = 1 PB/day

Bandwidth Estimates

Incoming Messages: 870K msg/sec × 1.1KB = ~957 MB/sec Outgoing Messages: 870K msg/sec × 1.1KB × 2 (avg recipients) = ~1.9 GB/sec Total Bandwidth: ~2.9 GB/sec = ~23.2 Gbps

System APIs

Core APIs

sendMessage(userId, chatId, message, mediaUrl)
- Send message to individual or group chat
- Returns: messageId, timestamp

getMessages(userId, chatId, limit, offset)
- Retrieve message history
- Returns: List of messages

markAsRead(userId, chatId, messageIds[])
- Mark messages as read
- Returns: success status

getChats(userId)
- Get list of user's chats
- Returns: List of chats with last message

updatePresence(userId, status)
- Update user presence (online, offline, away)
- Returns: success status

uploadMedia(userId, file)
- Upload media file
- Returns: mediaUrl

High-Level Architecture

┌─────────────────────────────────────────────────────────────┐
│                      Mobile/Web Clients                      │
└───────────────┬───────────────────────────────┬─────────────┘
                │                               │
        ┌───────▼────────┐             ┌────────▼────────┐
        │  Load Balancer │             │  API Gateway   │
        │  (WebSocket)   │             │  (REST API)    │
        └───────┬────────┘             └────────┬────────┘
                │                               │
    ┌───────────┼───────────┐                   │
    │           │           │                   │
┌───▼───┐  ┌───▼───┐  ┌───▼───┐         ┌─────▼─────┐
│ WS    │  │ WS    │  │ WS    │         │ Message   │
│Server │  │Server │  │Server │         │ Service   │
└───┬───┘  └───┬───┘  └───┬───┘         └─────┬─────┘
    │           │           │                   │
    └───────────┼───────────┘                   │
                │                               │
        ┌───────▼───────────────────────────────▼───────┐
        │         Message Queue (Kafka/RabbitMQ)         │
        └───────┬───────────────────────────────┬───────┘
                │                               │
        ┌───────▼────────┐             ┌────────▼────────┐
        │   Metadata     │             │   Message      │
        │   Service      │             │   Storage      │
        │   (Cassandra)  │             │   (Cassandra)  │
        └────────────────┘             └────────────────┘
                │                               │
        ┌───────▼───────────────────────────────▼───────┐
        │         Media Storage (Object Storage)         │
        └───────────────────────────────────────────────┘

Detailed Component Design

Communication Protocol: WebSocket vs HTTP Long Polling

Approach 1: WebSocket

How It Works: Persistent bidirectional connection between client and server.

Client                    WebSocket Server
------                    ---------------
  |--HTTP Upgrade--------->|
  |<--101 Switching--------|
  |   Protocols            |
  |                        |
  |<=====Persistent Connection=====>|
  |                        |
  |--Message-------------->|
  |<--Acknowledgment-------|
  |                        |
  |<--Push Message---------|
  |--Acknowledgment------->|

Pros:

  • Low Latency: No connection overhead after initial handshake
  • Bidirectional: Server can push messages immediately
  • Efficient: Lower overhead than HTTP polling
  • Real-time: True real-time communication

Cons:

  • Connection Management: Must manage persistent connections
  • Stateful Servers: Servers maintain connection state
  • Scaling Complexity: Harder to scale (sticky sessions needed)
  • Firewall Issues: Some networks block WebSocket

When to Use: Real-time requirements, low latency critical, high message frequency.

Approach 2: HTTP Long Polling

How It Works: Client sends request, server holds it open until message arrives or timeout.

Client                    HTTP Server
------                    -----------
  |--GET /messages-------->|
  |                        | (Hold connection)
  |                        | (Wait for message)
  |<--Message (after 30s)--|
  |                        |
  |--GET /messages-------->| (Immediately poll again)

Pros:

  • Stateless: Servers remain stateless
  • Firewall Friendly: Works through most firewalls
  • Simple Scaling: Easy to scale horizontally
  • Fallback: Can fallback to short polling

Cons:

  • Higher Latency: Up to polling interval delay
  • Resource Usage: Many open connections
  • Not True Real-time: Messages delayed until next poll

When to Use: Firewall restrictions, simpler scaling, acceptable latency.

Approach 3: Server-Sent Events (SSE)

How It Works: Server pushes events to client over HTTP connection.

Pros:

  • One-Way Push: Efficient for server-to-client
  • HTTP-Based: Works through firewalls
  • Automatic Reconnection: Built-in reconnection

Cons:

  • One-Way Only: Client must use separate HTTP for sending
  • Limited Browser Support: Not all browsers support well

When to Use: One-way push scenarios, notification systems.

Decision: Use WebSocket for real-time chat, with HTTP long polling as fallback.

Message Delivery Strategies

Strategy 1: Direct Delivery (Online Users)

How It Works: If recipient is online, deliver directly via WebSocket.

Sender          Message Service    Recipient (Online)
------          --------------    ------------------
  |--Send--------->|                    |
  |                | Store message       |
  |                |--Push via WS------->|
  |                |<--ACK---------------|
  |<--Success------|                    |

Pros:

  • Low Latency: Immediate delivery
  • Efficient: No queuing overhead
  • Real-time: True real-time delivery

Cons:

  • Requires Online: Only works for online users
  • Connection Dependency: Requires active WebSocket

Strategy 2: Message Queue for Offline Users

How It Works: Store messages in queue, deliver when user comes online.

Sender          Message Service    Queue          Recipient (Offline)
------          --------------    -----          -------------------
  |--Send--------->|                    |                |
  |                | Store message       |                |
  |                |--Enqueue----------->|                |
  |<--Success------|                    |                |
  |                |                    |                |
  |                |                    | (User comes online)
  |                |<--Dequeue-----------|                |
  |                |--Deliver--------------------------->|

Queue Options:

Apache Kafka:

  • Pros: High throughput, distributed, durable
  • Cons: Complex setup, overkill for simple use cases
  • When: High message volume, need ordering guarantees

RabbitMQ:

  • Pros: Flexible routing, good management UI
  • Cons: Lower throughput than Kafka
  • When: Complex routing needs, moderate volume

Redis Pub/Sub:

  • Pros: Simple, fast, low latency
  • Cons: Not durable (messages lost if subscriber offline)
  • When: Real-time only, don’t need persistence

Amazon SQS:

  • Pros: Managed service, auto-scaling
  • Cons: Vendor lock-in, cost at scale
  • When: AWS ecosystem, want managed solution

Decision: Use Kafka for message queue (durability, ordering), Redis for online user delivery (low latency).

Strategy 3: Hybrid Approach

Online Users: Deliver via WebSocket immediately Offline Users: Store in Kafka, deliver on reconnection Group Messages: Store in Kafka, fan-out to all members

Presence Management

Approach 1: Heartbeat-Based

How It Works: Clients send periodic heartbeats, server tracks last heartbeat time.

Client                    Presence Service
------                    ----------------
  |--Heartbeat (every 30s)->|
  |<--ACK-------------------|
  |                         |
  | (If no heartbeat for 60s, mark offline)

Pros:

  • Simple: Easy to implement
  • Accurate: Good accuracy with frequent heartbeats

Cons:

  • Network Overhead: Constant heartbeat traffic
  • Battery Drain: Mobile devices drain battery
  • Scalability: High overhead at scale (50M users × heartbeat = high load)

Optimization: Adaptive heartbeat (increase interval when idle).

Approach 2: Event-Based

How It Works: Update presence only on state changes (app open/close, screen lock).

Client                    Presence Service
------                    ----------------
  |--App Open-------------->|
  | (Mark online)           |
  |                         |
  |--Screen Lock----------->|
  | (Mark away)             |
  |                         |
  |--App Close------------->|
  | (Mark offline)          |

Pros:

  • Efficient: Minimal network usage
  • Battery Friendly: No constant heartbeats
  • Accurate: Based on actual user actions

Cons:

  • Delayed Updates: May not detect crashes immediately
  • Platform Dependent: Different events on different platforms

When to Use: Mobile-first applications, battery optimization critical.

Approach 3: Hybrid (Heartbeat + Events)

How It Works: Combine event-based updates with occasional heartbeats for accuracy.

Pros:

  • Balanced: Good accuracy with low overhead
  • Resilient: Handles edge cases (crashes, network issues)

Cons:

  • More Complex: Must handle both mechanisms

Decision: Use hybrid approach - event-based primary, heartbeat as backup.

Database Design

Schema Design

Users Table:

user_id (PK)
username
email
phone_number
created_at
last_seen_at
profile_picture_url

Chats Table:

chat_id (PK)
chat_type (1-on-1, group)
created_at
updated_at

Chat_Participants Table:

chat_id (FK)
user_id (FK)
joined_at
role (admin, member)

Messages Table:

message_id (PK)
chat_id (FK)
sender_id (FK)
content
message_type (text, image, video, file)
media_url
created_at

Message_Status Table:

message_id (FK)
user_id (FK)
status (sent, delivered, read)
updated_at

User_Presence Table:

user_id (PK)
status (online, offline, away)
last_seen_at

Database Choice: SQL vs NoSQL

SQL (PostgreSQL):

  • Pros: ACID transactions, complex queries, relationships
  • Cons: Harder to scale horizontally
  • When: Need transactions, complex queries

NoSQL (Cassandra):

  • Pros: Horizontal scaling, high write throughput, partition tolerance
  • Cons: Eventual consistency, limited queries
  • When: High write volume, need horizontal scaling

Decision: Use Cassandra for messages (high write volume), PostgreSQL for user data (complex queries).

Sharding Strategy

Shard by User ID:

  • Hash user_id to determine shard
  • All user’s chats on same shard
  • Pros: Efficient user queries
  • Cons: Cross-shard group chats expensive

Shard by Chat ID:

  • Hash chat_id to determine shard
  • All messages for chat on same shard
  • Pros: Efficient chat queries
  • Cons: User’s chats spread across shards

Hybrid Approach:

  • Messages sharded by chat_id
  • User metadata sharded by user_id
  • Chat list cached per user

Caching Strategy

Multi-Level Caching

Level 1: Client Cache:

  • Recent messages cached locally
  • Offline access
  • Reduces server load

Level 2: Redis Cache:

  • Active chats cached
  • User presence cached
  • Recent messages cached

Cache Invalidation:

  • Write-Through: Write to cache and DB simultaneously
  • Write-Back: Write to cache, flush to DB asynchronously
  • TTL-Based: Expire after time period

Cache Keys:

user:{userId}:presence
chat:{chatId}:messages:recent
user:{userId}:chats

Media Handling

Storage Architecture

Object Storage: Store media files in S3/GCS CDN: Serve media via CDN for fast delivery Thumbnails: Generate and cache thumbnails

Processing Pipeline

Upload → Validation → Storage → Thumbnail Generation → CDN Distribution

Optimization:

  • Compression: Compress images/videos
  • Multiple Formats: Generate different sizes
  • Lazy Loading: Load full media on demand

Scalability Patterns

Horizontal Scaling

Stateless Servers: Design servers to be stateless Load Balancing: Distribute connections across servers Sharding: Partition data across multiple databases

Message Fan-out for Groups

Challenge: Group with 1000 members, one message = 1000 deliveries

Approach 1: Synchronous Fan-out:

  • Send to all members immediately
  • Pros: Low latency
  • Cons: Slow if any member slow

Approach 2: Asynchronous Fan-out:

  • Queue message, fan-out asynchronously
  • Pros: Fast response
  • Cons: Slight delay for some members

Approach 3: Hybrid:

  • Send to online members synchronously
  • Queue for offline members
  • Pros: Balance latency and throughput

Decision: Use hybrid approach.

Read Replicas

Write to Primary: All writes go to primary database Read from Replicas: Distribute reads across replicas Replication Lag: Acceptable for chat (eventual consistency)

Real-World Implementations

WhatsApp Architecture

Protocol: Custom protocol (not WebSocket) Message Delivery: Store-and-forward Encryption: End-to-end encryption (Signal Protocol) Scaling: Erlang-based, handles billions of messages Storage: Messages stored on user’s device primarily

Discord Architecture

Protocol: WebSocket for real-time, REST for API Message Delivery: Real-time via WebSocket Scaling: Microservices architecture Storage: PostgreSQL for metadata, object storage for media Presence: Event-based with heartbeat fallback

Telegram Architecture

Protocol: Custom MTProto protocol Message Delivery: Cloud-based, multi-DC Encryption: Optional end-to-end encryption Scaling: Distributed across multiple data centers Storage: Messages stored in cloud

Trade-offs Summary

WebSocket vs HTTP Long Polling:

  • WebSocket: Lower latency, harder to scale
  • HTTP: Easier to scale, higher latency

Synchronous vs Asynchronous Delivery:

  • Synchronous: Lower latency, lower throughput
  • Asynchronous: Higher throughput, slight latency

SQL vs NoSQL:

  • SQL: Strong consistency, harder to scale
  • NoSQL: Eventual consistency, easier to scale

Heartbeat vs Event-Based Presence:

  • Heartbeat: More accurate, higher overhead
  • Event-Based: Lower overhead, less accurate

Conclusion

Designing a scalable chat system requires balancing multiple concerns: real-time delivery, scalability, consistency, and user experience. The key is choosing the right trade-offs for your specific requirements.

Key decisions:

  1. WebSocket for real-time communication
  2. Kafka for message queuing
  3. Cassandra for message storage (high write volume)
  4. Redis for caching and online delivery
  5. Hybrid presence (events + heartbeat)
  6. Asynchronous fan-out for group messages

By understanding these trade-offs and making informed decisions, we can build chat systems that scale to millions of users while maintaining low latency and high availability.