🛑 Rate Limiting: Traffic Control

Rate limiting is the process of controlling the number of requests a user can make to a system within a specific time period.

💡 The Logic (ELI5)

Think of a Free Sample Stand at a grocery store:

They give out free cheese.
If one person stands there and eats the entire block of cheese, it's unfair to everyone else.
Rate Limiting is the rule: "One sample per person every 10 minutes."
This ensures there is enough cheese for everyone and the store doesn't go bankrupt.

🔍 The Deep Dive

Algorithms to Know

Token Bucket: You have a bucket of "tokens." Each request takes a token. Tokens refill at a set rate. If the bucket is empty, you are blocked. (Great for handling Bursts).
Leaky Bucket: Requests enter a bucket at any speed but leave (get processed) at a constant rate. Like a bucket with a hole in the bottom.
Fixed Window: 100 requests allowed from 12:00 to 12:01. (Simple, but has a "Burst" problem at the edge of the window).
Sliding Window: A more accurate version of Fixed Window that moves with time.

Where to Implement?

Client-side: Good for UX, but easily bypassed.
API Gateway: The most common place. Blocks bad traffic before it hits your services.
Service-side: Specific limits for sensitive operations (e.g., Password Reset).

🎯 Interview Pulse

How to identify the user?

Interviewers will ask: "How do you track who is making the request?" Options:

IP Address (Problem: Multiple people behind a VPN).
User ID (Best for logged-in users).
API Key (Best for external developers).

Distributed Rate Limiting

If you have 10 servers, how do they all know how many requests User X has made? Answer: Use a centralized store like Redis. It's fast enough to handle the 1ms check required for every request.

HTTP Response

When a user is rate-limited, always return HTTP 429 Too Many Requests. You should also include a Retry-After header telling them when they can try again. 🚦