一种用于词性标注的相关投票融合策略

郭永辉,吴保民,王炳锡

PDF(278 KB)
PDF(278 KB)
中文信息学报 ›› 2007, Vol. 21 ›› Issue (2) : 9-13.
综述

一种用于词性标注的相关投票融合策略

  • 郭永辉,吴保民,王炳锡
作者信息 +

Correlation Voting Fusion Strategy Used for Part of Speech Tagging

  • GUO Yong-hui, WU Bao-min, WANG Bing-xi
Author information +
History +

摘要

各种词性标注方法总是利用从某一侧面描述的语言学知识,当训练语料达到一定规模、训练模型完善到一定程度后,标注精度很难再有进一步的提高。本文在对TBED、DT、HMM和ME四种基于语料库的词性标注方法研究的基础上,提出了一种新的词性标注融合策略——相关投票法。从理论上分析了该方法的优越性,并与其他融合策略进行了对比实验。实验结果表明,应用融合策略可以更加全面地描述词性标注知识,从而更好地完成词性标注任务;在几种融合策略中,相关投票法是最优秀的,它使标注的平均错误率降低27.85%。

Abstract

Part-of-speech (POS) tagging approaches always utilizes linguistic knowledge described from one perspective. Based on the research of four kinds of POS tagging methods, such as, TBED, DT, HMM and ME, we propose a novel data fusion strategy for POS tagging--- correlation voting method. The result of experiment shows that linguistic knowledge of POS tagging can be more roundly described by applying data fusion, and the correlative voting is better than other fusion methods for an average decrease of 27.85% in tagging error rate.

关键词

人工智能 / 自然语言处理 / 词性标注 / 融合策略 / 相关投票

Key words

artificial intelligence / natural language processing / part of speech tagging / fusion strategy / correlation voting

引用本文

导出引用
郭永辉,吴保民,王炳锡. 一种用于词性标注的相关投票融合策略. 中文信息学报. 2007, 21(2): 9-13
GUO Yong-hui, WU Bao-min, WANG Bing-xi. Correlation Voting Fusion Strategy Used for Part of Speech Tagging. Journal of Chinese Information Processing. 2007, 21(2): 9-13

参考文献


[1] 陈丹琪.统计与规则相结合的英语英语性标注和基本名词短语分析[D]. 博士学位论文.哈尔滨: 哈尔滨工业大学, 1999.
[2] 张民, 李生, 等.统计与规则并举的汉语词性自动标注算法[J].软件学报, 1998, 9(2): 134-138.
[3] Eric Brill.A Corpus-Based Approach to Language Learning[D].PhD Dissertation. University of Pennsylvania, 1993.
[4] James Hammerton, Miles Osborne, Susan Armstrong, et al.Introduction to Special Issue on Machine Learning Approaches to Shallow OParsing[J].Journal of Machine Learning Research 2, 2002, 551-558.
[5] Eric Brill.Unsupervised Learning of Disambiguation Rules for Part of Speech.Natural Language [M].Kluwer Academic Press, 1997.
[6] Helmut Schmid.Probabilistic Part-of-Speech Using Decision[A].In: Proceedings of International Conference on New Methods in Language Processing[C]. 1994. 44-49.
[7] Thorsten Brants.TnT—A Statistical Part-of-Speech Tagger[A].In: Proceedings of the 6th Applied Natural Language Processing Conference[C]. 2000. 224-231.
[8] Adwait Ratnaparkhi.A Maximum Entropy Model for Part-Of-Speech Tagging[A]. In: Proceedings of Conference on Empirical Methods in Natural Language Processing[C]. 1996. 132-142.
[9] Chan P.K. and Stolfo S.J.A Comparative Evaluation of Voting and Meta-Learning of Partitioned Data[A].In: Proceedings of the 12th International Conference on Machine Learning[C]. 1995. 90-98.
[10] 郭永辉.英汉机器翻译系统关键技术研究[D]. 博士学位论文.郑州: 解放军信息工程大学, 2006.
[11] Mitchel P.Marcus.Building A large annotated corpus of English: the Penn Treebank [J]. Communicational linguistics, 1993, 19(2): 313-330.

基金

国家自然科学基金资助项目(60372038)
PDF(278 KB)

564

Accesses

0

Citation

Detail

段落导航
相关文章

/