⏱️ Latency vs Throughput

Latency and Throughput are the two primary metrics we use to measure the performance of a distributed system. They are related but distinct.

💡 The Logic (ELI5)

Think of a Water Pipe:

Latency: How long it takes for a single drop of water to travel from one end of the pipe to the other. (Speed/Delay).
Throughput: How much water flows through the pipe every second. (Capacity/Volume).

You can have High Latency but High Throughput (A giant pipe that takes a long time to travel through but delivers tons of water) or Low Latency but Low Throughput (A tiny, very short pipe).

🔍 The Deep Dive

Latency (The "Wait")

Latency is the time it takes for a request to travel from the sender to the receiver and for the receiver to process that request.

Measured in: ms (milliseconds).
Goal: Minimize it.
Affected by: Speed of light, distance, network congestion, processing time.

Throughput (The "Work")

Throughput is the number of units of work a system can handle in a given time period.

Measured in: QPS (Queries Per Second) or TPS (Transactions Per Second).
Goal: Maximize it.
Affected by: Hardware limits, parallelism, code efficiency.

🎯 Interview Pulse

The Connection

In a well-designed system, you want to keep Latency stable even as Throughput increases. Common Trap: When Throughput hits a certain point (Saturation), Latency often spikes because requests are waiting in a queue.

Design Questions

"How do you reduce Latency?" (Use CDN, Caching, Edge Computing, faster DB queries).
"How do you increase Throughput?" (Horizontal Scaling, Load Balancing, Batch Processing, Asynchronous tasks via Message Queues).

Crucial Note

Don't just say "Performance." Be specific. If the user says their app feels "slow," that's a Latency issue. If the server is crashing under high load, that's a Throughput issue. 🚀