Core Module
12 min forge

Pattern Matching

Master the algorithms used by text editors and search engines to find snippets of text within large documents.

πŸ” Pattern Matching: Searching Text

Pattern matching is the problem of finding occurrences of a Pattern ($P$) within a larger Text ($T$). This is the core logic behind CTRL+F and search engines.

πŸ’‘ The Logic (ELI5)

Imagine you're searching for the word "cat" in a long book:

  • Approach 1 (Brute Force): You point at every single character in the book, one by one. If you see a 'c', you check the next letter for 'a'. If not, you move forward by only one letter and try again.
  • Approach 2 (Optimized): You realize that the word "cat" has no repeating parts. If you fail at 't', you don't need to try starting at 'a' or 't', you can skip ahead!

πŸ” The Deep Dive

High-Level Algorithms

  1. Brute Force (NaΓ―ve): Compares every possible starting point.
    • Complexity: $O(n \times m)$ where $n=Text$, $m=Pattern$.
  2. KMP (Knuth-Morris-Pratt): Uses a pre-computed "LPS" (Longest Prefix Suffix) table to skip redundant comparisons.
    • Complexity: $O(n + m)$.
  3. Rabin-Karp: Uses Hashing to quickly filter out mismatches. Only performs full comparison if the hash values match.
    • Complexity: Average $O(n + m)$.
  4. Z-Algorithm: Computes a $Z$-array where $Z[i]$ is the length of the longest common prefix starting at $T[i]$ and $P$.
    • Complexity: $O(n + m)$.

While KMP is the gold standard, the NaΓ―ve approach is the starting point for all search logic.

javascript Standard
/** * NaΓ―ve Search implementation * Returns all starting indices where the pattern occurs in the text. */ function searchPattern(text, pattern) { const n = text.length; const m = pattern.length; const result = []; // Slide the pattern over the text one by one for (let i = 0; i <= n - m; i++) { let j; // Check for pattern match at current index 'i' for (j = 0; j < m; j++) { if (text[i + j] !== pattern[j]) { break; // Mismatch found } } // If we reached the end of the pattern, we found a match if (j === m) { result.push(i); } } return result; } // Example const T = "AABAACAADAABAABA"; const P = "AABA"; console.log(searchPattern(T, P)); // [0, 9, 12]

🎯 Interview Pulse

Which one to use?

  • KMP: Best when the alphabet is small and many prefixes match suffixes (e.g., DNA sequences like AAAAAA).
  • Rabin-Karp: Best for finding multiple patterns simultaneously.
  • NaΓ―ve: Usually acceptable for small patterns or when time is limited.

Patterns to Watch

  • Wildcard Matching: Searching with * or ?.
  • Finding the Minimum Window: Finding the smallest substring of $T$ that contains all characters of $P$.
  • String Permutation: Checking if any permutation of $P$ exists in $T$.