|
|
Fast Approximate String Matching for Chinese Text |
CHEN Kai-qu,ZHAO Jie,PENG Zhi-wei |
ZTE Corporation |
|
|
Abstract For now there are two effective methods to improve approximate string matching : bit-vector method and filter method. Since Chinese alphabet has many characters , it needs much computer memory for bit-vector method. This would be a problem for some little computer which has a small memory , such as embedded system. We present a new bit-vector method which needs only about 5% computer memory of original bit-vector method. And , we also utilize the fact that Chinese alphabet is very large and develop a new filter method , BPM-BM , for approximate string matching of Chinese text . It runs at least 14% faster than the known fasted algorithms. In most cases , our algorithm is even 1.5~2 times faster.
|
Received: 02 March 2003
|
|
|
|
|
[1] Sellers , P. . The theory and computation of evolutionary distance : pattern recognition. Journal of Algorithms[J] , 1980 ,1 :359 - 373. [2] Baeza-Yates , R. A. ,Gonnet , G. H. : A new approach to text searching , Communications of the ACM [J] . 35 (10) :74 - 82. [3] Wu , S. , Manber , U. . Fast text searching allowing errors , Communications of the ACM[J] . 35 (10) : 83 - 91. [4] Baeza-Yates , R. A. , Navarro , G. . Faster approximate string matching. Algorithmica [J] , 23 (2) : 1999 ,127 - 158. [5] Myers , G. : A fast bit-vector algorithm for approximate string matching based on dynamic programming. Journal of the ACM[J] , 46 (3) :1999 ,395 - 415. [6] Chang , W. , Marr , T. . Approximate string matching and local similarity[A] . In : Proc. 5th Combinational Pattern Matching (CPM'94) [C] , LNCS 807 , pages 1994 ,259 - 271. [7] Navarro , G. , Baeza-Yates , R. A. . Very fast and simple approximate string matching. Information Processing Letters[J] , 1999 ,72 :65 - 70. [8] Navarro , G. , Raffinot , M. . Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM Journal of Experimental Algorithmics (JEA) [J] , 2000 ,5 (4) . [9] Sutinen , E. , Tarhio , J. . On using q-gram locations in approximate string matching[A] . In : Proc. European Symposium on Algorithms (ESA'95) [C] , LNCS 979 , 1995 ,327 - 340. [10] Tarhio , J. , Ukkonen , E. . Approximate Boyer-Moore string matching. SIAM Journal on computing [J] , 1993 ,22 (2) :243 - 260. [11] Hyyr?, H. , Navarro , G. . Faster bit-parallel approximate string matching[A] . In : Proc. 13th Combination Pattern Matching (CPM'02) [C] , LNCS 2002 ,2373. [12] Boyer , R. S. , Moore , J. S. . A fast string searching algorithm. Communications of the ACM[J] . 1977 ,20 :762 - 772. [13] Ukkonen , E. . Algorithms for approximate string matching. Information and Contro [J] , 1985 ,64 : 100 - 118. |
|
|
|