利用遗传算法实现词类标记集的优化

孙宏林,陆勤,俞士汶

PDF(458 KB)
PDF(458 KB)
中文信息学报 ›› 2001, Vol. 15 ›› Issue (1) : 19-27.
综述

利用遗传算法实现词类标记集的优化

  • 孙宏林1,3,陆勤2,俞士汶1
作者信息 +

Using Genetic Algorithms for Optimizing Part-of-Speech Tagset

  • SUN Hong-lin1,3,LU Qin2,YU Shi-wen1
Author information +
History +

摘要

过去词类标记集的选择主要基于专家的经验知识,缺乏自动或半自动的方法来辅助这一过程。本文提出了一种利用遗传算法来搜索优化的标记集的新方法。这种方法可以在一个候选标记集集合中自动搜索一个最优或较优的标记集,并可根据应用的需求调整参数以适应特定任务的需求。实验表明:遗传算法为标记集的选择提供了一种系统的有效的辅助手段。

Abstract

POS tagset selection in the past was mainly done by experts using human knowledge manually ,since there is no automatic or semi-automatic way to assist the selection process. This paper proposes a novel method to search for an optimal POS tagset using genetic algorithms (GA) . The experiment shows that GA provides an efficient optimization of POS tagset and allows for the adjustment of parameters according to user requirement . It provides a systematic way to help people in making an intelligent choice on the selection of a tagset .

关键词

词性标注 / 词类 / 标记集 / 遗传算法

Key words

POS tagging / word class / POS tagset / genetic algorithm

引用本文

导出引用
孙宏林,陆勤,俞士汶. 利用遗传算法实现词类标记集的优化. 中文信息学报. 2001, 15(1): 19-27
SUN Hong-lin,LU Qin,YU Shi-wen. Using Genetic Algorithms for Optimizing Part-of-Speech Tagset. Journal of Chinese Information Processing. 2001, 15(1): 19-27

参考文献

[1] B?ck T. Optimal mutation rates in genetic search. In : Proceedings of the 5th International Conference on Genetic Algorithms (ICGA′93) . Morgan Kaufmann ,1993 ,2 - 9
[2] Chinchor N. MUC - 7 Named Entity task definition. In : Proceedings of the Seventh Message Understanding Conference (MUC - 7) . 1998 ,http:∥www.muc.saic.com/proceedings/muc-7-toc.html
[3] Goldberg D E. Genetic Algorithms in Search ,Optimization ,and Machine Learning. Addison-Wesley ,1989
[4] Halteren H. Syntactic Wordclass Tagging. Kluwer Academic Publishers ,1999
[5] Holland J H. Adaptation in Natural and Artificial Systems. University of Michigan Press ,1975 (Second edition : MIT Press ,1992)
[6] Mitchell M. An Introduction to Genetic Algorithms. MIT Press ,1996
[7] Sun H ,Yu S ,Lu Q. Evaluations on Part-of-speech Tagset . In : Proceedings of the 5th Natural Language Processing Pacific Rim Symposium(NLPRS(99) . Tsinghua University Press ,1999 ,25 - 31
[8] 孙宏林等.“现代汉语研究语料库系统”概述. 见:罗振声、袁毓林. 计算机时代的汉语和汉字研究. 北京:清华大学出版社,1996

基金

973项目(G1998030507-4);国家自然科学基金项目(69973005);香港理工大学研究基金
PDF(458 KB)

512

Accesses

0

Citation

Detail

段落导航
相关文章

/