Real-time communication is no longer a nice-to-have feature. Users expect instant updates, live notifications, and seamless collaboration. But scaling WebSocket connections is fundamentally different from scaling HTTP endpoints.
The Challenge
When building the real-time chat system for an e-sports platform, I faced a concrete problem: support 2,000+ concurrent WebSocket connections with message delivery under 100ms. This is where theory meets reality.
Architecture Overview
The final architecture looked like this:
┌─────────────────────────────────────────────────────────┐
│ Load Balancer │
│ (Sticky Sessions) │
└─────────────────────────────────────────────────────────┘
│
┌──────────────────┼──────────────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Node.js │ │ Node.js │ │ Node.js │
│ Server │ │ Server │ │ Server │
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
└──────────────────┼──────────────────┘
│
┌──────┴──────┐
│ Redis │
│ Pub/Sub │
└─────────────┘
Key Design Decisions
1. Connection Management
Each WebSocket connection consumes memory. With thousands of connections, this adds up quickly. I implemented a connection manager that tracks metadata without storing message history in memory:
interface ConnectionMetadata {
odId: string;
odName: string;
odChannel: string;
connectedAt: Date;
lastActivity: Date;
}
class ConnectionManager {
private connections: Map<string, WebSocket> = new Map();
private metadata: Map<string, ConnectionMetadata> = new Map();
register(ws: WebSocket, userId: string, channel: string) {
this.connections.set(odId, ws);
this.metadata.set(odId, {
odId,
odName,
odChannel,
connectedAt: new Date(),
lastActivity: new Date()
});
}
broadcast(channel: string, message: object) {
const recipients = this.getChannelMembers(channel);
const payload = JSON.stringify(message);
for (const userId of recipients) {
const ws = this.connections.get(userId);
if (ws?.readyState === WebSocket.OPEN) {
ws.send(payload);
}
}
}
}
2. Redis Pub/Sub for Multi-Server Communication
When you have multiple Node.js servers, a message sent to one server needs to reach clients connected to other servers. Redis Pub/Sub solves this elegantly:
import Redis from 'ioredis';
const publisher = new Redis();
const subscriber = new Redis();
// When a message is received on one server
function handleIncomingMessage(channel: string, message: ChatMessage) {
// Publish to Redis so all servers receive it
publisher.publish(`chat:${channel}`, JSON.stringify(message));
}
// All servers subscribe to relevant channels
subscriber.psubscribe('chat:*');
subscriber.on('pmessage', (pattern, channel, message) => {
const channelId = channel.replace('chat:', '');
const parsed = JSON.parse(message);
connectionManager.broadcast(channelId, parsed);
});
3. Heartbeat and Dead Connection Cleanup
WebSocket connections can silently die (network issues, client crashes). Implementing heartbeats prevents resource leaks:
const HEARTBEAT_INTERVAL = 30000; // 30 seconds
const CONNECTION_TIMEOUT = 60000; // 60 seconds
setInterval(() => {
const now = Date.now();
for (const [userId, metadata] of connectionManager.metadata) {
const ws = connectionManager.connections.get(userId);
if (!ws || ws.readyState !== WebSocket.OPEN) {
connectionManager.remove(userId);
continue;
}
if (now - metadata.lastActivity.getTime() > CONNECTION_TIMEOUT) {
ws.terminate();
connectionManager.remove(userId);
continue;
}
// Send ping
ws.ping();
}
}, HEARTBEAT_INTERVAL);
Performance Optimizations
Message Batching
Instead of sending every message immediately, batch messages that arrive within a small time window:
class MessageBatcher {
private queue: Map<string, Message[]> = new Map();
private timers: Map<string, NodeJS.Timeout> = new Map();
add(channel: string, message: Message) {
if (!this.queue.has(channel)) {
this.queue.set(channel, []);
}
this.queue.get(channel)!.push(message);
if (!this.timers.has(channel)) {
this.timers.set(channel, setTimeout(() => {
this.flush(channel);
}, 50)); // 50ms batching window
}
}
private flush(channel: string) {
const messages = this.queue.get(channel) || [];
this.queue.delete(channel);
this.timers.delete(channel);
if (messages.length > 0) {
connectionManager.broadcast(channel, {
type: 'batch',
messages
});
}
}
}
Binary Protocol
For high-frequency updates (like game state), consider using binary protocols instead of JSON:
// Using MessagePack instead of JSON
import { encode, decode } from '@msgpack/msgpack';
function sendBinary(ws: WebSocket, data: object) {
const binary = encode(data);
ws.send(binary);
}
Monitoring and Metrics
You can’t optimize what you can’t measure. Essential metrics to track:
- Connection count per server
- Message throughput (messages/second)
- Latency percentiles (p50, p95, p99)
- Memory usage per connection
- Redis Pub/Sub lag
// Prometheus-style metrics
const connectionGauge = new Gauge({
name: 'websocket_connections_total',
help: 'Total active WebSocket connections'
});
const messageCounter = new Counter({
name: 'websocket_messages_total',
help: 'Total messages processed',
labelNames: ['type', 'channel']
});
const latencyHistogram = new Histogram({
name: 'websocket_message_latency_ms',
help: 'Message delivery latency in milliseconds',
buckets: [5, 10, 25, 50, 100, 250, 500]
});
Lessons Learned
-
Sticky sessions are essential: Without them, clients reconnecting go to different servers and lose their session state.
-
Graceful degradation: When under heavy load, drop non-critical messages (typing indicators) before critical ones (actual messages).
-
Client-side reconnection: Implement exponential backoff with jitter to prevent thundering herd on server recovery.
-
Load testing early: Use tools like Artillery or k6 to simulate thousands of connections before production.
Conclusion
Scaling WebSockets isn’t magic—it’s careful architecture, understanding your bottlenecks, and relentless optimization. Start with a simple implementation, measure everything, and iterate based on real data.
The principles here apply whether you’re building a chat system, live sports updates, or collaborative editing. Master them, and real-time features become just another tool in your arsenal.