Real-world Systems
12 min forge

Notification System Design

Designing a scalable system to send millions of emails, SMS, and push notifications.

πŸ”” Notification System: System Design

A notification system is essential for user engagement. It must handle high volumes, multiple channels, and ensure reliable delivery.

1. Requirements

Functional

  • Multi-channel Support: Push notifications, SMS, Email.
  • Priority Signaling: Critical alerts (OTP) vs. Marketing.
  • User Preferences: Allow users to opt-out of certain types.
  • Rate Limiting: Prevent spamming users.

Non-Functional

  • Scalability: Handle thousands of notifications per second.
  • High Reliability: Messages shouldn't be lost.
  • Extensibility: Easy to add new providers (Twilio, SendGrid, etc.).

2. Architecture Overview

mermaid Standard
graph TD Services[User Services] --> API[Notification API] API --> LB[Load Balancer] LB --> Buffer[Async Queue - Kafka/RabbitMQ] Buffer --> Workers[Notification Workers] Workers --> Cache[User Prefs Cache - Redis] Workers --> ThirdParty[Providers: FCM, Twilio, SendGrid] ThirdParty --> User((User))

3. Key Components

Notification API

A collection of endpoints to trigger notifications.

  • Internal Only: Usually accessed only by other internal services.

Message Queue

Decouples the trigger from the delivery. If a provider is down, the message stays in the queue until it succeeds.

Notification Workers

The core logic:

  1. Fetch user preferences and contact info.
  2. Filter/Deduplicate messages.
  3. Call the appropriate third-party API.
  4. Log status for analytics.

4. Reliability & Edge Cases

At-least-once Delivery

Use persistent queues and acknowledgement mechanisms to ensure no notification is lost during service crashes.

Deduplication

Sometimes the same event triggers multiple notifications. Use a dedupe_key or event_id in Redis to ensure a user doesn't receive the same alert twice in a short window.

Rate Limiting

Limit the number of marketing notifications per hour per user.

Retries

If a third-party provider fails, use Exponential Backoff to retry.


5. Metadata Storage

Store delivery status in a database to provide "Notification History" to users.

  • notification_id
  • status (Sent, Delivered, Failed)
  • channel
  • timestamp