Core Module
12 min forge
Database Sharding
Master horizontal scaling for data. Learn how to split massive databases across multiple servers using Shard Keys.
πͺ Database Sharding: Partitioning Data
Sharding is the process of breaking up a large database into smaller, more manageable chunks called shards, and distributing them across multiple servers.
π‘ The Logic (ELI5)
Think of an Encyclopedia:
- You have one giant book with every word from A to Z. It's too heavy to carry.
- Sharding is splitting that book into 26 volumes.
- Volume 1 has 'A', Volume 2 has 'B', etc.
- Now, if you want to look up "Apple," you only need to grab Volume 1.
- If 26 people want to look up different words, they can all grab a different volume at the same time!
π The Deep Dive
Why Shard?
- Speed: Smaller databases are faster to search.
- Capacity: No single server can hold all the data for sites like Facebook or Google.
- Fail-safety: If one shard crashes, only a fraction of the users are affected.
Common Sharding Schemes
- Key-based (Hash) Sharding: Take the user ID, hash it, and use the result to pick a shard. (e.g.,
user_id % 3). - Range-based Sharding: Users with IDs 1-1000 go to Shard A, 1001-2000 go to Shard B.
- Directory-based Sharding: A separate service (Lookup Table) keeps track of which data is on which shard.
π― Interview Pulse
Use Case: The "Shard Key"
The most important question in a sharding interview is: "What is your Shard Key?" Choosing a bad key (e.g., "Country" for a localized app) can lead to Hotspots, where one shard is doing 99% of the work while others are empty.
The Downside of Sharding
- Complexity: Your application code now needs to know which shard to talk to.
- Joins: Joining data across two different shards is almost impossible (or very slow).
- Resharding: If your shards get too full, moving data to a new shard is a nightmare.
Top Tip
Sharding is a last resort. Before sharding, always try:
- Optimizing your queries.
- Adding Indexes.
- Implementing Caching.
- Vertical Scaling. ποΈ