2007年3月13日星期二

Knuth-Pratt-Morris算法

简称KPM算法(或KMP或MKP或...),是一种快速的字符串匹配算法,据说大部分的子串搜索函数都是用该算法实现的。算法如下:

第一步,根据模板字符串P,构建匹配表T,其中P的字符数为 n,T的size也为 n
  1. Set T[ 0] = 0, and let P have n characters.
  2. Set i = 1, j = 0.
  3. If i >= n then we are done; terminate the algorithm. Otherwise, compare P[i] with P[j].
    • If they are equal, set T[i] = j + 1, j = j + 1, and i = i + 1.
    • If not, and if j > 0, set j = T[j − 1].
    • Otherwise, set T[i] = 0, i = i + 1.
  4. Return to step 3.
第二步,使用构造好的表,执行算法,在字符串S搜索P,其中S的字符数为 l ,P的字符数为 n
  1. Let i = m = 0; say P has n characters, and S, l characters.
  2. If m + i = l, then we exit; there is no match. Otherwise, compare P[i] versus S[m + i].
    • If they are equal, set i = i + 1. If i = n then we have found a full match; terminate the algorithm and return m as the start of the match.
    • If they are unequal, let e = T[i − 1]. Set m = m + ie and if i > 0, set i = e.
  3. Return to step 2.
构造匹配表的时间是O( n),查找的时间是O( l ),速度相当快。

没有评论: