过去词类标记集的选择主要基于专家的经验知识,缺乏自动或半自动的方法来辅助这一过程。本文提出了一种利用遗传算法来搜索优化的标记集的新方法。这种方法可以在一个候选标记集集合中自动搜索一个最优或较优的标记集,并可根据应用的需求调整参数以适应特定任务的需求。实验表明:遗传算法为标记集的选择提供了一种系统的有效的辅助手段。
Abstract
POS tagset selection in the past was mainly done by experts using human knowledge manually ,since there is no automatic or semi-automatic way to assist the selection process. This paper proposes a novel method to search for an optimal POS tagset using genetic algorithms (GA) . The experiment shows that GA provides an efficient optimization of POS tagset and allows for the adjustment of parameters according to user requirement . It provides a systematic way to help people in making an intelligent choice on the selection of a tagset .
关键词
词性标注 /
词类 /
标记集 /
遗传算法
{{custom_keyword}} /
Key words
POS tagging /
word class /
POS tagset /
genetic algorithm
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] B?ck T. Optimal mutation rates in genetic search. In : Proceedings of the 5th International Conference on Genetic Algorithms (ICGA′93) . Morgan Kaufmann ,1993 ,2 - 9
[2] Chinchor N. MUC - 7 Named Entity task definition. In : Proceedings of the Seventh Message Understanding Conference (MUC - 7) . 1998 ,http:∥www.muc.saic.com/proceedings/muc-7-toc.html
[3] Goldberg D E. Genetic Algorithms in Search ,Optimization ,and Machine Learning. Addison-Wesley ,1989
[4] Halteren H. Syntactic Wordclass Tagging. Kluwer Academic Publishers ,1999
[5] Holland J H. Adaptation in Natural and Artificial Systems. University of Michigan Press ,1975 (Second edition : MIT Press ,1992)
[6] Mitchell M. An Introduction to Genetic Algorithms. MIT Press ,1996
[7] Sun H ,Yu S ,Lu Q. Evaluations on Part-of-speech Tagset . In : Proceedings of the 5th Natural Language Processing Pacific Rim Symposium(NLPRS(99) . Tsinghua University Press ,1999 ,25 - 31
[8] 孙宏林等.“现代汉语研究语料库系统”概述. 见:罗振声、袁毓林. 计算机时代的汉语和汉字研究. 北京:清华大学出版社,1996
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
973项目(G1998030507-4);国家自然科学基金项目(69973005);香港理工大学研究基金
{{custom_fund}}