Skip to main content

System Design Interview: Real-Time Chat System

February 14, 2026By CTO15 min read
...
questions

Complete system design interview question for building a real-time chat application like Slack or WhatsApp, covering WebSockets, message delivery guarantees, and presence systems.

Role: Senior Engineer / Staff Engineer
Level: Senior
Type: System Design

System Design Interview: Real-Time Chat System

Design a real-time chat application similar to Slack or WhatsApp. This question tests understanding of real-time communication, message delivery guarantees, presence systems, and data modeling at scale.

Interview Format (45 minutes)

Time Allocation:

  • Requirements gathering: 5-8 minutes
  • High-level design: 10-15 minutes
  • Deep dive: 15-20 minutes
  • Scale and edge cases: 5-10 minutes

Step 1: Requirements Gathering (5-8 min)

A strong candidate will clarify the scope before designing anything.

Functional Requirements

Good questions to ask:

  • Is this 1:1 chat, group chat, or both? (both, groups up to 500 members)
  • Do we need message history/persistence? (yes, searchable)
  • What message types? (text, images, files)
  • Do we need read receipts? (yes)
  • Online/offline presence? (yes)
  • Push notifications? (yes, for offline users)
  • Message editing/deletion? (yes)

Agreed requirements:

  1. 1:1 and group messaging (up to 500 members)
  2. Real-time message delivery
  3. Persistent message history with search
  4. Read receipts and typing indicators
  5. Online/offline presence
  6. Push notifications for offline users
  7. Image and file sharing

Non-Functional Requirements

Good questions to ask:

  • Expected user base? (50M DAU)
  • Messages per day? (1B messages/day)
  • Message size limit? (64KB text, 100MB files)
  • Latency requirements? (<200ms delivery)
  • Geographic distribution? (global)
  • Message retention? (forever for paid, 90 days for free)

Agreed requirements:

  • Low latency (<200ms for message delivery)
  • High availability (99.99% uptime)
  • Message ordering guaranteed within a conversation
  • At-least-once delivery (with deduplication)
  • End-to-end encryption (stretch goal)

Calculations

Messages:

50M DAU, average 20 messages/day = 1B messages/day
Average message size: 200 bytes
1B x 200 bytes = 200GB/day = 73TB/year

Connections:

50M concurrent WebSocket connections (peak)
Each connection: ~10KB memory
50M x 10KB = 500GB RAM for connections alone

QPS:

1B messages/day = ~12,000 messages/sec average
Peak (3x): ~36,000 messages/sec

Red flags if candidate:

  • Designs only for HTTP polling
  • Doesn't consider message ordering
  • Ignores offline scenarios
  • Doesn't ask about group size limits

Step 2: High-Level Design (10-15 min)

API Design

WebSocket Connection:

wss://chat.example.com/ws?token=<auth_token>

// Client -> Server
{
  "type": "send_message",
  "conversationId": "conv_123",
  "content": "Hello!",
  "clientMessageId": "client_uuid_456"  // for deduplication
}

// Server -> Client
{
  "type": "new_message",
  "messageId": "msg_789",
  "conversationId": "conv_123",
  "senderId": "user_001",
  "content": "Hello!",
  "timestamp": "2026-02-14T10:30:00Z"
}

REST APIs (for non-real-time operations):

GET  /api/conversations                    # List conversations
GET  /api/conversations/:id/messages       # Message history (paginated)
POST /api/conversations                    # Create conversation/group
POST /api/conversations/:id/messages       # Send message (fallback)
PUT  /api/messages/:id                     # Edit message
DELETE /api/messages/:id                   # Delete message
POST /api/upload                           # Upload file/image

Good candidate discusses:

  • WebSocket vs SSE vs long polling trade-offs
  • REST fallback for reliability
  • Client-generated message IDs for deduplication

Core Components

┌───────────────┐
│    Clients    │
└───────┬───────┘
        │ WSS
┌───────▼───────────────────────────────────┐
│          WebSocket Gateway                │
│  (Connection management, routing)         │
└───────┬───────────────┬───────────────────┘
        │               │
┌───────▼───────┐ ┌─────▼──────────────┐
│  Chat Service │ │  Presence Service  │
│  (Messages)   │ │  (Online status)   │
└───────┬───────┘ └─────┬──────────────┘
        │               │
┌───────▼───────┐ ┌─────▼──────────────┐
│  Message DB   │ │  Redis Cluster     │
│  (Cassandra)  │ │  (Presence + Pub/Sub)│
└───────────────┘ └────────────────────┘

Data Model

Messages (Cassandra / DynamoDB):

-- Partition by conversation, sorted by time
CREATE TABLE messages (
    conversation_id UUID,
    message_id TIMEUUID,
    sender_id UUID,
    content TEXT,
    content_type TEXT,       -- 'text', 'image', 'file'
    media_url TEXT,
    created_at TIMESTAMP,
    edited_at TIMESTAMP,
    deleted BOOLEAN,
    PRIMARY KEY (conversation_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);

Conversations (PostgreSQL):

CREATE TABLE conversations (
    id UUID PRIMARY KEY,
    type VARCHAR(10),         -- 'direct', 'group'
    name VARCHAR(255),
    created_at TIMESTAMP,
    updated_at TIMESTAMP
);

CREATE TABLE conversation_members (
    conversation_id UUID REFERENCES conversations(id),
    user_id UUID,
    role VARCHAR(20) DEFAULT 'member',
    joined_at TIMESTAMP,
    last_read_message_id UUID,
    PRIMARY KEY (conversation_id, user_id)
);

CREATE INDEX idx_user_conversations
    ON conversation_members(user_id);

Step 3: Deep Dive (15-20 min)

Message Delivery Flow

Sender -> WebSocket Gateway -> Chat Service -> Message DB
                                    |
                                    v
                              Message Queue
                                    |
                    ┌───────────────┼───────────────┐
                    v               v               v
              WS Gateway      WS Gateway      Push Service
              (User A)        (User B)        (Offline Users)

Implementation:

class ChatService:
    def handle_message(self, sender_id, conversation_id, content, client_msg_id):
        # 1. Deduplication check
        if self.message_store.exists_by_client_id(client_msg_id):
            return  # Already processed

        # 2. Validate sender is member of conversation
        if not self.is_member(sender_id, conversation_id):
            raise PermissionError("Not a member")

        # 3. Store message
        message = self.message_store.create(
            conversation_id=conversation_id,
            sender_id=sender_id,
            content=content,
            client_message_id=client_msg_id
        )

        # 4. Get conversation members
        members = self.get_members(conversation_id)

        # 5. Fan out to online members via pub/sub
        for member_id in members:
            if member_id != sender_id:
                self.pubsub.publish(
                    channel=f"user:{member_id}",
                    message=message.to_dict()
                )

        # 6. Send push notifications to offline members
        offline_members = [m for m in members
                          if not self.presence.is_online(m)]
        self.push_service.notify(offline_members, message)

        # 7. Acknowledge to sender
        return {"status": "delivered", "messageId": message.id}

Presence System

Challenge: Tracking 50M online users in real time

class PresenceService:
    def __init__(self, redis):
        self.redis = redis
        self.HEARTBEAT_INTERVAL = 30  # seconds
        self.TIMEOUT = 90  # seconds

    def user_connected(self, user_id):
        self.redis.hset(f"presence:{user_id}", mapping={
            "status": "online",
            "last_seen": time.time(),
            "server_id": self.server_id
        })
        self.redis.expire(f"presence:{user_id}", self.TIMEOUT)

        # Notify contacts
        self._broadcast_status(user_id, "online")

    def heartbeat(self, user_id):
        self.redis.hset(f"presence:{user_id}",
                       "last_seen", time.time())
        self.redis.expire(f"presence:{user_id}", self.TIMEOUT)

    def user_disconnected(self, user_id):
        # Don't immediately mark offline (might reconnect)
        self.redis.hset(f"presence:{user_id}",
                       "status", "away")

        # Schedule offline check after grace period
        self.scheduler.schedule(
            delay=30,
            task=self._check_still_offline,
            args=(user_id,)
        )

    def is_online(self, user_id):
        data = self.redis.hgetall(f"presence:{user_id}")
        if not data:
            return False
        return (time.time() - float(data["last_seen"])) < self.TIMEOUT

    def _broadcast_status(self, user_id, status):
        # Only broadcast to users who have this user in their contacts
        contacts = self.get_contacts(user_id)
        for contact_id in contacts:
            self.pubsub.publish(
                channel=f"user:{contact_id}",
                message={"type": "presence", "userId": user_id, "status": status}
            )

Strong candidate discusses:

  • Heartbeat mechanism vs connection-based detection
  • Grace period before marking offline
  • Fan-out problem for popular users (hundreds of contacts)
  • Lazy presence (only check when user opens a conversation)

Read Receipts and Typing Indicators

# Read receipts: persistent (stored in DB)
def mark_read(user_id, conversation_id, message_id):
    db.update("conversation_members",
        set={"last_read_message_id": message_id},
        where={"conversation_id": conversation_id,
               "user_id": user_id})

    # Notify other members
    pubsub.publish(f"conv:{conversation_id}", {
        "type": "read_receipt",
        "userId": user_id,
        "lastReadMessageId": message_id
    })

# Typing indicators: ephemeral (never stored)
def typing_started(user_id, conversation_id):
    pubsub.publish(f"conv:{conversation_id}", {
        "type": "typing",
        "userId": user_id,
        "status": "started"
    })
    # Auto-expire after 5 seconds (in case stop event lost)

Message Ordering

Challenge: Ensuring messages appear in correct order across devices

Approach: Server-assigned timestamps + sequence numbers

class MessageOrderer:
    def assign_order(self, conversation_id, message):
        # Atomic increment per conversation
        seq = self.redis.incr(f"seq:{conversation_id}")
        message.sequence_number = seq
        message.server_timestamp = time.time_ns()
        return message

    def resolve_conflicts(self, messages):
        # Sort by sequence number (primary)
        # Then by server timestamp (secondary)
        return sorted(messages,
                     key=lambda m: (m.sequence_number, m.server_timestamp))

Strong candidate discusses:

  • Client-side vs server-side timestamps
  • Causal ordering vs total ordering
  • Handling out-of-order delivery on client

Step 4: Scale and Edge Cases (5-10 min)

Scaling WebSocket Connections

Problem: Single server can handle ~500K connections max

Solution: WebSocket Gateway Cluster

┌────────────────────────────────────────────────┐
│              Load Balancer (L4)                 │
│          (Sticky sessions by user_id)          │
└──────┬──────────┬──────────┬──────────┬────────┘
       │          │          │          │
  ┌────▼────┐ ┌──▼─────┐ ┌─▼──────┐ ┌▼───────┐
  │  WS GW  │ │ WS GW  │ │ WS GW  │ │ WS GW  │
  │  500K   │ │ 500K   │ │ 500K   │ │ 500K   │
  └────┬────┘ └──┬─────┘ └─┬──────┘ └┬───────┘
       │         │         │         │
       └────────┬┴─────────┴─────────┘
                │
       ┌────────▼──────────┐
       │   Redis Pub/Sub   │
       │   (Message Bus)   │
       └───────────────────┘

Connection registry (which user is on which server):

class ConnectionRegistry:
    def register(self, user_id, server_id):
        self.redis.sadd(f"connections:{user_id}", server_id)

    def unregister(self, user_id, server_id):
        self.redis.srem(f"connections:{user_id}", server_id)

    def get_servers(self, user_id):
        return self.redis.smembers(f"connections:{user_id}")

    def route_message(self, user_id, message):
        servers = self.get_servers(user_id)
        for server_id in servers:
            self.pubsub.publish(f"server:{server_id}", {
                "target_user": user_id,
                "message": message
            })

Group Message Fan-Out

Problem: A message to a 500-person group means 499 deliveries

def fan_out_group_message(conversation_id, message):
    members = get_members(conversation_id)

    if len(members) <= 50:
        # Small group: fan-out on write (push to each member)
        for member_id in members:
            deliver_to_user(member_id, message)
    else:
        # Large group: fan-out on read (members pull when online)
        store_in_conversation_feed(conversation_id, message)
        # Only push notification to online + mentioned users
        online = [m for m in members if is_online(m)]
        mentioned = extract_mentions(message.content)
        notify_users = set(online + mentioned)
        for user_id in notify_users:
            deliver_to_user(user_id, message)

Offline Message Sync

def sync_messages(user_id, last_sync_timestamp):
    """Called when a user comes back online"""
    conversations = get_user_conversations(user_id)

    unread = {}
    for conv_id in conversations:
        last_read = get_last_read_message(user_id, conv_id)
        new_messages = get_messages_after(conv_id, last_read,
                                         limit=50)
        if new_messages:
            unread[conv_id] = {
                "messages": new_messages,
                "unread_count": count_unread(conv_id, last_read)
            }

    return unread

Edge Cases

Strong candidates identify:

  • Network partitions (messages sent but not acknowledged)
  • Device sync (user on phone and laptop simultaneously)
  • Large media files (separate upload flow with CDN)
  • Spam and abuse (rate limiting, content moderation)
  • Message deletion propagation across all devices
  • Clock skew between servers

Evaluation Rubric

Strong Performance (Hire)

  • Chooses WebSockets with proper justification
  • Designs for message ordering and delivery guarantees
  • Handles presence efficiently at scale
  • Considers fan-out strategies for groups
  • Discusses offline sync and push notifications
  • Clear separation of real-time vs persistent data
  • Mentions security (encryption, auth)

Adequate Performance (Maybe)

  • Functional design with WebSockets
  • Basic message storage and retrieval
  • Some scaling considerations
  • Misses edge cases like offline sync or ordering
  • Can be guided toward better solutions

Weak Performance (No Hire)

  • Only considers HTTP polling
  • No thought given to delivery guarantees
  • Doesn't address group messaging challenges
  • Can't reason about connection management at scale
  • Poor data model choices

Follow-up Questions

For senior candidates:

  • How would you implement end-to-end encryption?
  • Design the notification system in detail
  • How would you handle message search across billions of messages?
  • How would you implement message reactions and threads?

For staff+ candidates:

  • Design the infrastructure for global deployment with <100ms latency
  • How would you handle compliance (message retention, legal holds)?
  • Design the system for 500M DAU
  • How would you implement real-time translation?

This question tests real-time systems design, pub/sub patterns, presence management, and data consistency under concurrent writes. A strong candidate will balance latency requirements with delivery guarantees while maintaining clear system boundaries.