Part-of-speech (POS) tagging approaches always utilizes linguistic knowledge described from one perspective. Based on the research of four kinds of POS tagging methods, such as, TBED, DT, HMM and ME, we propose a novel data fusion strategy for POS tagging--- correlation voting method. The result of experiment shows that linguistic knowledge of POS tagging can be more roundly described by applying data fusion, and the correlative voting is better than other fusion methods for an average decrease of 27.85% in tagging error rate.
GUO Yong-hui, WU Bao-min, WANG Bing-xi.
Correlation Voting Fusion Strategy Used for Part of Speech Tagging. Journal of Chinese Information Processing. 2007, 21(2): 9-13
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 陈丹琪.统计与规则相结合的英语英语性标注和基本名词短语分析[D]. 博士学位论文.哈尔滨: 哈尔滨工业大学, 1999. [2] 张民, 李生, 等.统计与规则并举的汉语词性自动标注算法[J].软件学报, 1998, 9(2): 134-138. [3] Eric Brill.A Corpus-Based Approach to Language Learning[D].PhD Dissertation. University of Pennsylvania, 1993. [4] James Hammerton, Miles Osborne, Susan Armstrong, et al.Introduction to Special Issue on Machine Learning Approaches to Shallow OParsing[J].Journal of Machine Learning Research 2, 2002, 551-558. [5] Eric Brill.Unsupervised Learning of Disambiguation Rules for Part of Speech.Natural Language [M].Kluwer Academic Press, 1997. [6] Helmut Schmid.Probabilistic Part-of-Speech Using Decision[A].In: Proceedings of International Conference on New Methods in Language Processing[C]. 1994. 44-49. [7] Thorsten Brants.TnT—A Statistical Part-of-Speech Tagger[A].In: Proceedings of the 6th Applied Natural Language Processing Conference[C]. 2000. 224-231. [8] Adwait Ratnaparkhi.A Maximum Entropy Model for Part-Of-Speech Tagging[A]. In: Proceedings of Conference on Empirical Methods in Natural Language Processing[C]. 1996. 132-142. [9] Chan P.K. and Stolfo S.J.A Comparative Evaluation of Voting and Meta-Learning of Partitioned Data[A].In: Proceedings of the 12th International Conference on Machine Learning[C]. 1995. 90-98. [10] 郭永辉.英汉机器翻译系统关键技术研究[D]. 博士学位论文.郑州: 解放军信息工程大学, 2006. [11] Mitchel P.Marcus.Building A large annotated corpus of English: the Penn Treebank [J]. Communicational linguistics, 1993, 19(2): 313-330.