To solve this problem, the usual enumeration will not work, because its asymptotics will be
O(t*s). Therefore, for substring search tasks, the KMP (Knuth-Morris-Pratt) algorithm is used.
To implement this algorithm, the string prefix function is used, this is an array of numbers of length
n (strong>s length), in which each element is the maximum length of the largest own suffix of the substring
s, matching its prefix. For example:
Line |
Prefix function |
Note |
abababcab |
0 0 1 2 3 4 0 1 2 |
|
abcabcd |
0 0 0 1 2 3 0 |
|
aabaaab |
0 1 0 1 2 2 3 |
|
Trivial prefix function algorithm with O(n^3) asymptotics:
vector<int> prefix_function (string s) {
int n = (int ) s.length();
vector<int>pi(n);
for (int i=0; i<n; ++i)
for (int k=0; k<=i; ++k)
if (s.substr(0,k) == s .substr(i-k+1,k))
pi[i] = k;
returnpi;
}
And now we need to make a prefix function for a string of the form: & nbsp;
t + # +
s, where # is a delimiter character that is clearly not in the text. Analyzing the values of the prefix function after the corresponding separator character, it should be noted that if the resulting value is equal to the length of the desired substring, then its occurrence is found. For example, for the string "abababcab" and the desired substring "abab", the prefix function will be:
0 0 1 2 0 1 2 3 4 3 4 0 1 2 we need to parse the elements of the corresponding string s:
1 2 3 4 3 4 0 1 2 - there are two cases where the value is 4 ( 4th and 6th ), i.e. length
t, as a result, the answer must be written: 4 - 4 (length t) & nbsp; = 0 and 6 - 4 = 2. It can be seen that these are the correct answers and the string "abab" is actually a substring in the string "abababcab" in positions 0 and 2.