Scaling WebSocket Connections to Thousands of Concurrent Users

Real-time communication is no longer a nice-to-have feature. Users expect instant updates, live notifications, and seamless collaboration. But scaling WebSocket connections is fundamentally different from scaling HTTP endpoints.

The Challenge

When building the real-time chat system for an e-sports platform, I faced a concrete problem: support 2,000+ concurrent WebSocket connections with message delivery under 100ms. This is where theory meets reality.

Architecture Overview

The final architecture looked like this:

┌─────────────────────────────────────────────────────────┐
│                    Load Balancer                        │
│                   (Sticky Sessions)                     │
└─────────────────────────────────────────────────────────┘
                           │
        ┌──────────────────┼──────────────────┐
        │                  │                  │
        ▼                  ▼                  ▼
   ┌─────────┐       ┌─────────┐       ┌─────────┐
   │ Node.js │       │ Node.js │       │ Node.js │
   │ Server  │       │ Server  │       │ Server  │
   └────┬────┘       └────┬────┘       └────┬────┘
        │                  │                  │
        └──────────────────┼──────────────────┘
                           │
                    ┌──────┴──────┐
                    │    Redis    │
                    │   Pub/Sub   │
                    └─────────────┘

Key Design Decisions

1. Connection Management

Each WebSocket connection consumes memory. With thousands of connections, this adds up quickly. I implemented a connection manager that tracks metadata without storing message history in memory:

interface ConnectionMetadata {
  odId: string;
  odName: string;
  odChannel: string;
  connectedAt: Date;
  lastActivity: Date;
}

class ConnectionManager {
  private connections: Map<string, WebSocket> = new Map();
  private metadata: Map<string, ConnectionMetadata> = new Map();

  register(ws: WebSocket, userId: string, channel: string) {
    this.connections.set(odId, ws);
    this.metadata.set(odId, {
      odId,
      odName,
      odChannel,
      connectedAt: new Date(),
      lastActivity: new Date()
    });
  }

  broadcast(channel: string, message: object) {
    const recipients = this.getChannelMembers(channel);
    const payload = JSON.stringify(message);

    for (const userId of recipients) {
      const ws = this.connections.get(userId);
      if (ws?.readyState === WebSocket.OPEN) {
        ws.send(payload);
      }
    }
  }
}

2. Redis Pub/Sub for Multi-Server Communication

When you have multiple Node.js servers, a message sent to one server needs to reach clients connected to other servers. Redis Pub/Sub solves this elegantly:

import Redis from 'ioredis';

const publisher = new Redis();
const subscriber = new Redis();

// When a message is received on one server
function handleIncomingMessage(channel: string, message: ChatMessage) {
  // Publish to Redis so all servers receive it
  publisher.publish(`chat:${channel}`, JSON.stringify(message));
}

// All servers subscribe to relevant channels
subscriber.psubscribe('chat:*');
subscriber.on('pmessage', (pattern, channel, message) => {
  const channelId = channel.replace('chat:', '');
  const parsed = JSON.parse(message);
  connectionManager.broadcast(channelId, parsed);
});

3. Heartbeat and Dead Connection Cleanup

WebSocket connections can silently die (network issues, client crashes). Implementing heartbeats prevents resource leaks:

const HEARTBEAT_INTERVAL = 30000; // 30 seconds
const CONNECTION_TIMEOUT = 60000; // 60 seconds

setInterval(() => {
  const now = Date.now();

  for (const [userId, metadata] of connectionManager.metadata) {
    const ws = connectionManager.connections.get(userId);

    if (!ws || ws.readyState !== WebSocket.OPEN) {
      connectionManager.remove(userId);
      continue;
    }

    if (now - metadata.lastActivity.getTime() > CONNECTION_TIMEOUT) {
      ws.terminate();
      connectionManager.remove(userId);
      continue;
    }

    // Send ping
    ws.ping();
  }
}, HEARTBEAT_INTERVAL);

Performance Optimizations

Message Batching

Instead of sending every message immediately, batch messages that arrive within a small time window:

class MessageBatcher {
  private queue: Map<string, Message[]> = new Map();
  private timers: Map<string, NodeJS.Timeout> = new Map();

  add(channel: string, message: Message) {
    if (!this.queue.has(channel)) {
      this.queue.set(channel, []);
    }

    this.queue.get(channel)!.push(message);

    if (!this.timers.has(channel)) {
      this.timers.set(channel, setTimeout(() => {
        this.flush(channel);
      }, 50)); // 50ms batching window
    }
  }

  private flush(channel: string) {
    const messages = this.queue.get(channel) || [];
    this.queue.delete(channel);
    this.timers.delete(channel);

    if (messages.length > 0) {
      connectionManager.broadcast(channel, {
        type: 'batch',
        messages
      });
    }
  }
}

Binary Protocol

For high-frequency updates (like game state), consider using binary protocols instead of JSON:

// Using MessagePack instead of JSON
import { encode, decode } from '@msgpack/msgpack';

function sendBinary(ws: WebSocket, data: object) {
  const binary = encode(data);
  ws.send(binary);
}

Monitoring and Metrics

You can’t optimize what you can’t measure. Essential metrics to track:

Connection count per server
Message throughput (messages/second)
Latency percentiles (p50, p95, p99)
Memory usage per connection
Redis Pub/Sub lag

// Prometheus-style metrics
const connectionGauge = new Gauge({
  name: 'websocket_connections_total',
  help: 'Total active WebSocket connections'
});

const messageCounter = new Counter({
  name: 'websocket_messages_total',
  help: 'Total messages processed',
  labelNames: ['type', 'channel']
});

const latencyHistogram = new Histogram({
  name: 'websocket_message_latency_ms',
  help: 'Message delivery latency in milliseconds',
  buckets: [5, 10, 25, 50, 100, 250, 500]
});

Lessons Learned

Sticky sessions are essential: Without them, clients reconnecting go to different servers and lose their session state.
Graceful degradation: When under heavy load, drop non-critical messages (typing indicators) before critical ones (actual messages).
Client-side reconnection: Implement exponential backoff with jitter to prevent thundering herd on server recovery.
Load testing early: Use tools like Artillery or k6 to simulate thousands of connections before production.

Conclusion

Scaling WebSockets isn’t magic—it’s careful architecture, understanding your bottlenecks, and relentless optimization. Start with a simple implementation, measure everything, and iterate based on real data.

The principles here apply whether you’re building a chat system, live sports updates, or collaborative editing. Master them, and real-time features become just another tool in your arsenal.