arrow_backBack to Knowledge Base
System Design·Mar 19, 2025·14 MIN READ

Designing Scalable Notification Systems

Architecting a real-time WebSocket notification service with Redis Pub/Sub, fan-out queues, and offline user recovery.

websocketsredispub-sub

01.The Core Challenge

Notification systems look simple until they're not. When you have 10 users, polling works. At 10,000 concurrent users, you need WebSockets. At 100,000 users across multiple server instances, you need a pub/sub layer so that a notification triggered on Server A reaches a user connected to Server B.

02.WebSocket Connection Management

Each WebSocket connection is stateful — the server must track which user is connected to which socket. Store connection mappings in memory per server instance. For multi-instance deployments, use Redis to publish events to all instances and let each instance fan out to its locally connected clients.

typescript
// On connection
connectedUsers.set(userId, socket);

// On Redis message received
redisSubscriber.on('message', (channel, message) => {
  const { userId, notification } = JSON.parse(message);
  const socket = connectedUsers.get(userId);
  if (socket?.connected) socket.emit('notification', notification);
});

03.Redis Pub/Sub for Fan-Out

When a notification needs to be delivered, publish it to a Redis channel. All server instances are subscribed to that channel. Each instance checks if the target user is connected locally and delivers if so. This gives you horizontal scalability without a centralized notification router.

04.Offline User Recovery

What happens when a user isn't connected when a notification is sent? Store undelivered notifications in a PostgreSQL table keyed by userId, with a delivered_at timestamp. When a user reconnects via WebSocket, query for undelivered notifications, deliver them in order, and mark as delivered.

  • Store notification in DB with delivered_at = null
  • On WS connect: SELECT undelivered WHERE user_id = $1
  • Deliver in chronological order
  • Update delivered_at = NOW() after each delivery
  • Retain notifications for 30 days for history UI

05.Push Notifications for Mobile

For mobile users who may be offline for extended periods, pair WebSockets with FCM (Firebase Cloud Messaging) or APNs. When a notification can't be delivered via WebSocket, publish to an SQS queue that a Lambda function consumes to send push notifications. This ensures delivery across all platforms and connectivity states.