Core Module
12 min forge

Fault Tolerance

Master the art of system survival. Learn how to design systems that continue to operate even when components fail.

πŸ›‘οΈ Fault Tolerance

Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more of its components.

πŸ’‘ The Logic (ELI5)

Think of a Multi-Engine Plane:

  1. A plane with one engine is Efficient. If the engine fails, the plane crashes.
  2. A plane with four engines is Fault Tolerant.
  3. If one engine catches fire, the pilot shuts it down and the plane keeps flying with the other three.
  4. The passengers might feel a little bump (Slightly degraded performance), but they are safe!

πŸ” The Deep Dive

High Availability (HA) vs Fault Tolerance

  • High Availability: Aiming for "up-time" (the site is accessible). Usually involves failing over to a backup server (causes a few seconds of downtime).
  • Fault Tolerance: Aiming for "zero downtime." Even during a failure, the user sees no disruption. (Much more expensive).

How to achieve Fault Tolerance?

  1. Redundancy: Having multiple copies of everything (Servers, Databases, Network cables).
  2. Replication: Keeping those copies in sync.
  3. Failover: Automatically switching to a healthy component when one fails.
  4. Graceful Degradation: If the "Search" feature fails, the "Login" and "Post" features should still work.

🎯 Interview Pulse

Use Case: Availability Figures (The Nines)

  • 99.9% (Three Nines): ~9 hours of downtime per year.
  • 99.999% (Five Nines): ~5 minutes of downtime per year! To hit "Five Nines," you must have a fault-tolerant design with no single point of failure (SPOF).

The "Shared Nothing" Architecture

In a truly fault-tolerant system, each node is independent. If Node A fails, Node B doesn't even notice because it doesn't "share" memory or state with Node A.

Standard Question

"How do you ensure your database is fault tolerant?" Answer: Master-Slave replication with automatic failover. If the master dies, a slave is promoted to master automatically by a "Health Watcher" or "Zookeeper." πŸ›‘οΈ