KMP Algorithm

Understanding the Knuth-Morris-Pratt algorithm for linear-time pattern searching.

Key Notes

The Knuth-Morris-Pratt (KMP) algorithm is a classic and highly efficient string searching algorithm. Its main purpose is to find all occurrences of a pattern `P` within a text `T` in O(m + n) time, where `m` is the length of the pattern and `n` is the length of the text. This is a significant improvement over the naive O(m*n) approach. The genius of KMP lies in its preprocessing step. When a mismatch occurs between the text and the pattern, the naive algorithm simply shifts the pattern one character to the right and starts comparing again from the beginning of the pattern. KMP is smarter; it uses information from the pattern itself to know how many characters it can safely shift without missing a potential match. This is achieved by pre-calculating a Longest Proper Prefix Suffix (LPS) array, also known as a failure function. The `lps[i]` value for the pattern stores the length of the longest proper prefix of `P[0...i]` which is also a suffix of `P[0...i]`. When a mismatch occurs at `text[i]` and `pattern[j]`, we don't need to go back in the text. Instead, we consult `lps[j-1]` to find the new position in the pattern to continue the comparison from. This avoids redundant comparisons of characters in the text that we already know will match.

Back to Strings

KMP Algorithm

Understanding the Knuth-Morris-Pratt algorithm for linear-time pattern searching.

Key Notes