🔪 Database Sharding: Partitioning Data

Sharding is the process of breaking up a large database into smaller, more manageable chunks called shards, and distributing them across multiple servers.

💡 The Logic (ELI5)

Think of an Encyclopedia:

You have one giant book with every word from A to Z. It's too heavy to carry.
Sharding is splitting that book into 26 volumes.
Volume 1 has 'A', Volume 2 has 'B', etc.
Now, if you want to look up "Apple," you only need to grab Volume 1.
If 26 people want to look up different words, they can all grab a different volume at the same time!

🔍 The Deep Dive

Why Shard?

Speed: Smaller databases are faster to search.
Capacity: No single server can hold all the data for sites like Facebook or Google.
Fail-safety: If one shard crashes, only a fraction of the users are affected.

Common Sharding Schemes

Key-based (Hash) Sharding: Take the user ID, hash it, and use the result to pick a shard. (e.g., user_id % 3).
Range-based Sharding: Users with IDs 1-1000 go to Shard A, 1001-2000 go to Shard B.
Directory-based Sharding: A separate service (Lookup Table) keeps track of which data is on which shard.

🎯 Interview Pulse

Use Case: The "Shard Key"

The most important question in a sharding interview is: "What is your Shard Key?" Choosing a bad key (e.g., "Country" for a localized app) can lead to Hotspots, where one shard is doing 99% of the work while others are empty.

The Downside of Sharding

Complexity: Your application code now needs to know which shard to talk to.
Joins: Joining data across two different shards is almost impossible (or very slow).
Resharding: If your shards get too full, moving data to a new shard is a nightmare.

Top Tip

Sharding is a last resort. Before sharding, always try:

Optimizing your queries.
Adding Indexes.
Implementing Caching.
Vertical Scaling. 🏔️