基于PATRICIA tree的汉语自动分词词典机制

杨文峰,陈光英,李星

PDF(363 KB)
PDF(363 KB)
中文信息学报 ›› 2001, Vol. 15 ›› Issue (3) : 45-50.

基于PATRICIA tree的汉语自动分词词典机制

  • 杨文峰,陈光英,李星
作者信息 +

PATRICIA-tree-based Dictionary Mechanism for Chinese Word Segmentation

  • YANG Wen-feng,CHEN Guang-ying,LI Xing
Author information +
History +

摘要

分词词典是汉语信息处理系统的一个基本组成部分,其查询和更新效率将直接影响汉语信息处理系统的性能。本文采用PATRICIA tree的数据结构,设计了一种可以对词典词条进行快速查询、更新的分词词典机制,并从理论上初步分析了它的性能。最后通过实验,在时间效率上与逐字二分的分词词典机制进行了比较。结果表明,基于PATRICIA tree的分词词典机制具有更高的查询速度和更新效率,能满足大规模、开放文本处理系统的需求。

Abstract

The dictionary mechanism is the basic component of Chinese information processing systems ,and its efficiency will greatly affect the performances of those systems. Based on the data structure of PATRICIA tree ,this paper designed a new PATRICIA-tree-based dictionary mechanism. Firstly ,the paper presents the primary function analysis of this PATRICIA-tree-based dictionary mechanism. Then a comparison is given between PATRICIA-tree-based and binary-seek-by-characters dictionary mechanism. All the results prove that the PATRICIA-tree-based dictionary mechanism is better than recently-used dictionary mechanisms in many aspects such as the efficiency of retireving and modifing the words and more suitable for the large-scale Chinese text processing systems.

关键词

信息检索 / PATRICIA tree / 汉语自动分词

Key words

information retrieval / PATRICIA tree / Chinese word segmentation

引用本文

导出引用
杨文峰,陈光英,李星. 基于PATRICIA tree的汉语自动分词词典机制. 中文信息学报. 2001, 15(3): 45-50
YANG Wen-feng,CHEN Guang-ying,LI Xing. PATRICIA-tree-based Dictionary Mechanism for Chinese Word Segmentation. Journal of Chinese Information Processing. 2001, 15(3): 45-50

参考文献

[1] 孙茂松,左正平,黄昌宁. 汉语自动分词词典机制的实验研究. 中文信息学报,2000 , (1)
[2] Morrison D. PATRICIA-Pratrical Algorithm to Retrieve Information Coded in Alphanumeric. JACM,1968 (15)
[3] Gaston H Gonnet ,Ricardo A.Baeza-yates and Tim Sinder. New indices for Text :PAT trees and PAT arrays. Information Retrieval Data Structures & Algorithms ,Prentice Hall ,1992
[4] http:∥compass.net.edu.cn:8888/PATRICIA.html
[5] 严蔚敏,吴伟民. 数据结构. 北京:清华大学出版社,1992

基金

863计划(863-306-ZD02-02-7)
PDF(363 KB)

Accesses

Citation

Detail

段落导航
相关文章

/