String Hashing
Learn how to convert strings into unique numerical values to perform constant-time comparisons and search.
ποΈ String Hashing: Digital Fingerprints
String hashing is a technique that transforms a string into a unique numerical value (a hash). This allows us to compare two strings in $O(1)$ time by comparing their hash values, instead of comparing them character-by-character in $O(n)$ time.
π‘ The Logic (ELI5)
Imagine you have two huge books and you want to know if they are identical:
- Approach 1: You sit down and read them side-by-side, word for word. This takes hours.
- Approach 2 (Hashing): You put each book through a "magic blender" (the hash function) that spits out a single number (the fingerprint).
- If the numbers are different, the books are definitely different.
- If the numbers are the same, the books are almost certainly identical.
π The Deep Dive
Polynomial Rolling Hash
The most common way to hash a string is the Polynomial Rolling Hash. We treat the string as a number in a specific base ($P$) and take the modulo ($M$) to keep it within integer limits.
Formula: $Hash(S) = (S[0] \cdot P^0 + S[1] \cdot P^1 + ... + S[n-1] \cdot P^{n-1}) \pmod M$
- P: Usually a prime number close to the size of the alphabet (e.g., 31 for lowercase English).
- M: A large prime number (e.g., $10^9 + 7$ or $10^9 + 9$).
Rolling Hash (The "Window" Trick)
When searching for a pattern in text, you can update the hash of a window in $O(1)$ by "subtracting" the character that leaves and "adding" the character that enters. This is the heart of the Rabin-Karp algorithm.
π οΈ Code Forge: Simple String Hashing
javascript Standard/** * Simple Polynomial Rolling Hash */ function computeHash(str) { const P = 31; // Base for lowercase letters const M = 1e9 + 9; // Large prime mod let hashValue = 0; let p_pow = 1; for (let i = 0; i < str.length; i++) { // Convert char to number (a=1, b=2...) const charCode = str.charCodeAt(i) - 'a'.charCodeAt(0) + 1; // Update hash hashValue = (hashValue + charCode * p_pow) % M; // Update power of P p_pow = (p_pow * P) % M; } return hashValue; } // Example console.log(computeHash("kunal")); // 4324233... console.log(computeHash("kunal")); // Same fingerprint!
π― Interview Pulse
Collision Handling
A Collision occurs when two different strings produce the same hash.
- In competitive programming or high-tier interviews, we use Double Hashing (computing two hashes with different bases/mods) to make the probability of a collision almost zero.
Interview Scenario
If an interviewer asks "How can you check if any two strings in an array are duplicates?", mention hashing. Using a Set (which uses internal hashing) solves this in $O(N \times L)$ time where $L$ is string length.
Advanced Topics
- Rolling Hash (Rabin-Karp).
- Suffix Automaton.
- Suffix Arrays.