孙宏林,陆勤,俞士汶. 利用遗传算法实现词类标记集的优化[J]. 中文信息学报, 2001, 15(1): 19-27.
SUN Hong-lin,LU Qin,YU Shi-wen. Using Genetic Algorithms for Optimizing Part-of-Speech Tagset. , 2001, 15(1): 19-27.
利用遗传算法实现词类标记集的优化
孙宏林1,3,陆勤2,俞士汶1
1.北京大学计算语言学研究所 2.香港理工大学电子计算学系 3.北京语言文化大学语言信息处理研究所
Using Genetic Algorithms for Optimizing Part-of-Speech Tagset
SUN Hong-lin1,3,LU Qin2,YU Shi-wen1
1.Institute of Computational Linguistics Peking University 2.Department of Computing ,Hong Kong Polytechnic University 3.Center for Language Information Processing ,Beijing Language & Culture University
Abstract:POS tagset selection in the past was mainly done by experts using human knowledge manually ,since there is no automatic or semi-automatic way to assist the selection process. This paper proposes a novel method to search for an optimal POS tagset using genetic algorithms (GA) . The experiment shows that GA provides an efficient optimization of POS tagset and allows for the adjustment of parameters according to user requirement . It provides a systematic way to help people in making an intelligent choice on the selection of a tagset .
[1] B?ck T. Optimal mutation rates in genetic search. In : Proceedings of the 5th International Conference on Genetic Algorithms (ICGA′93) . Morgan Kaufmann ,1993 ,2 - 9 [2] Chinchor N. MUC - 7 Named Entity task definition. In : Proceedings of the Seventh Message Understanding Conference (MUC - 7) . 1998 ,http:∥www.muc.saic.com/proceedings/muc-7-toc.html [3] Goldberg D E. Genetic Algorithms in Search ,Optimization ,and Machine Learning. Addison-Wesley ,1989 [4] Halteren H. Syntactic Wordclass Tagging. Kluwer Academic Publishers ,1999 [5] Holland J H. Adaptation in Natural and Artificial Systems. University of Michigan Press ,1975 (Second edition : MIT Press ,1992) [6] Mitchell M. An Introduction to Genetic Algorithms. MIT Press ,1996 [7] Sun H ,Yu S ,Lu Q. Evaluations on Part-of-speech Tagset . In : Proceedings of the 5th Natural Language Processing Pacific Rim Symposium(NLPRS(99) . Tsinghua University Press ,1999 ,25 - 31 [8] 孙宏林等.“现代汉语研究语料库系统”概述. 见:罗振声、袁毓林. 计算机时代的汉语和汉字研究. 北京:清华大学出版社,1996