KMP思想
参考
The Knuth-Morris-Pratt Algorithm in my own words
部分匹配表 对于KMP算法而言,最重要的就是部分匹配表。例如abababca
,其部分匹配表如下
1 2 3 char: |a |b |a |b |a |b |c |a | index:|0 |1 |2 |3 |4 |5 |6 |7 | value:|0 |0 |1 |2 |3 |4 |0 |1 |
对于匹配的某一位来说,我只是关心它前面的字符,不必去关心后面的字符。例如:匹配到index = 5
时,我只是关心前面从0到4位字符串。
两个重要的概念:
适宜前缀(Proper prefix):去除一位或多位结尾的所有字符。例如,对于Snape
来说,它的适宜前缀就是S
Sn
Sna
Snap
.
Proper prefix: All the characters in a string, with one or more cut off the end.
适宜后缀(Proper suffix):去除一位或多位开始的所有字符。例如,对于Snape
来说,它的适宜后缀就是e
pe
ape
nape
.
Proper suffix: All the characters in a string, with one or more cut off the beginning.
对于部分匹配表中的值我们是这样定义的:在子串中最长的适宜前缀匹配适宜后缀的长度.举个例子:对于字符串abababca
其中的第4位来说,其子串为ababa
,其中适宜前缀包括:a
ab
aba
abab
,适宜后缀包括:a
ba
aba
baba
,那么最长的匹配串就是aba
,其长度为3.
The length of the longest proper prefix in the (sub)pattern that matches a proper suffix in the same (sub)pattern.
匹配过程 匹配需要两个条件:匹配上且该匹配位上的值大于1。
滑动距离:匹配位置长度减去该位的值。
举个例子:被匹配串为bacbababaabcbab
,匹配串为abababca
,当匹配到如下情景时:
1 2 3 bacbababaabcbab ||||| abababca
这时候匹配子串长度为5,该位置上的值为3,那么就应该滑动 5-3=2
,滑动结果如下:
1 2 3 bacbababaabcbab * * ||| abababca
If a partial match of length partial_match_length is found and table[partial_match_length] > 1, we may skip ahead partial_match_length - table[partial_match_length - 1] characters.
Java实现 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 private int [] partialMatchTable(String needle){ int [] partialMatchTable = new int [needle.length ()]; for (int i = 0 ; i < needle.length (); i++){ String substring = needle.substring(0 ,i + 1 ); int length = substring.length () - 1 ; while (length > 0 ) { if (!substring.substring(0 ,substring.length () - 1 ).contains(String.valueOf(substring.charAt(substring.length () - 1 )))){ // 当匹配串的前length -1 的子串中不包含length 位字符直接不匹配 break ; } String properPrefix = substring.substring(0 , length ); String properSuffix = substring.substring(substring.length () - length , substring.length ()); if (properPrefix.equals(properSuffix)) { partialMatchTable[i] = length ; break ; } length --; } } return partialMatchTable; } public int kmp(String haystack, String needle){ int [] partialMatchTable = this.partialMatchTable(needle); int haystackindex = 0 ; int needleindex = 0 ; while (haystackindex < haystack.length () && needleindex < needle.length ()){ if (haystack.charAt(haystackindex) == needle.charAt(needleindex)){ haystackindex++; needleindex++; }else { if (needleindex > 0 && partialMatchTable[needleindex] != 0 ){ haystackindex = haystackindex - partialMatchTable[needleindex] + 1 ; }else { haystackindex = haystackindex - needleindex + 1 ; } needleindex = 0 ; } } if (needleindex > needle.length () - 1 ) { return haystackindex - needle.length (); } else { return -1 ; } }