Consistent Hashing
Master the secret of high-availability distribution. Learn how to add or remove servers with minimal data disruption.
π Consistent Hashing
Consistent hashing is a special kind of hashing such that when a hash table is resized, only $K/n$ keys need to be remapped on average, where $K$ is the number of keys, and $n$ is the number of slots.
π‘ The Logic (ELI5)
Standard Hashing (The Problem)
Think of a Classroom with 3 teachers (Servers).
- You assign students (Data) to teachers using
student_id % 3. - Student 1 -> Teacher 1, Student 2 -> Teacher 2, Student 3 -> Teacher 0.
- The Disaster: Teacher 0 gets sick and leaves. Now you use
student_id % 2. - Everyone has to change their teacher! This causes a massive "Cache Miss" storm in a real system.
Consistent Hashing (The Solution)
Think of a Circle (The Hash Ring):
- You place the 3 teachers at random points on a circle.
- You place students at random points on the same circle.
- Every student just walks Clockwise until they hit the first teacher.
- The Fix: If a teacher leaves, only the students who were assigned to that teacher have to walk a little further to the next one.
- 90% of the other students stay with their original teacher!
π The Deep Dive
How it Works
- Map both data keys and server nodes to a common range (the ring).
- Each data key is assigned to the first server found while moving clockwise on the ring.
- Adding a server: Only a small subset of keys move from an existing server to the new one.
- Removing a server: Only the keys assigned to the removed server move to its neighbor.
Virtual Nodes (The Upgrade)
To ensure the load is balanced perfectly (avoiding one teacher getting 0 students and another getting 100), we use Virtual Nodes. Each physical server is placed on the ring in multiple spots (e.g., Server A-1, Server A-2, Server A-3).
π― Interview Pulse
Use Case
Always mention Consistent Hashing when talking about Distributed Caching (Redis clusters) or Distributed Databases (Amazon DynamoDB, Cassandra).
The Primary Benefit
The goal is to Minimize Data Movement when the cluster changes size. This is crucial for performance and availability.
Keywords
- Hash Ring: The conceptual circle.
- Virtual Nodes: Helping with "Skewed Load" or hotspots.
- Failover: How the ring handles node crashes. π