Z-Algorithm

Constructing and using a Z-array for efficient pattern matching.

Key Notes

The Z-algorithm is a powerful and elegant linear-time string processing algorithm. For a given string `S` of length `n`, it produces an array, called the Z-array, also of length `n`. `Z[i]` is defined as the length of the longest substring starting from `S[i]` which is also a prefix of `S`. The first element, `Z[0]`, is conventionally set to 0. For example, if `S = "ababcababc"`, its Z-array would be `[0, 0, 3, 0, 1, 5, 0, 3, 0, 1]`. `Z[2]` is 3 because `S[2...4]` (`abc`) is the same as `S[0...2]` (`aba`). `Z[5]` is 5 because `S[5...9]` (`ababc`) is the same as `S[0...4]` (`ababc`). The Z-array can be constructed naively in O(n^2) time, but the core of the Z-algorithm is a clever method to build it in O(n) time. It does this by maintaining a 'Z-box' `[L, R]`, which is the interval that corresponds to the prefix-substring with the rightmost endpoint `R` found so far. This information is used to efficiently initialize the Z-values for indices inside this box. Once the Z-array is built, it can be used for pattern searching. By creating a new string `P + '$' + T` (where `P` is the pattern, `T` is the text, and `$` is a character not in `P` or `T`), we can compute its Z-array. Any index `i` in this new array where `Z[i]` is equal to the length of `P` corresponds to a match of `P` in `T`.

Back to Strings

Z-Algorithm

Constructing and using a Z-array for efficient pattern matching.

Key Notes