张颖杰1,李 斌1,2,陈家骏1,陈小荷2. 基于词典信息的先秦汉语全文词义标注方法研究[J]. 中文信息学报, 2012, 26(3): 65-72.
ZHANG Yingjie1, LI Bin1,2, CHEN Jiajun1, CHEN Xiaohe2. A Study in Dictionary-Based All-word Word Sense Disambiguation for Pre-Qin Chinese. , 2012, 26(3): 65-72.
A Study in Dictionary-Based All-word Word Sense Disambiguation for Pre-Qin Chinese
ZHANG Yingjie1, LI Bin1,2, CHEN Jiajun1, CHEN Xiaohe2
1. State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu 210093, China; 2. Research Center for Language Informatics, Nanjing Normal University, Nanjing, Jiangsu 210097, China
Abstract:Word Sense Disambiguation (WSD) is a basic task of Natural Language Processing,including the processing of ancient Chinese documents. In this paper we focuse on the specific field of analyzing pre-Qin ancient Chinese documents. Considering the shortage of training data and semantic resources, we employe a semi-supervised machine learning method to perform all-word WSD of Zuo Zhuan and use Chinese Dictionary v2.0 as the knowledge resource. We randomly selecte 22 words of different frequency and sense number to evaluate the proposed method. On the selected words, our method achieves an average accuracy of 67%, which is significant higher than the baseline method of selecting the most frequent sense. This method is promising for sense tagging of ancient Chinese documents when there is no training data available. It also provides a raw sense tagging result for human correction, enriching traditional dictionaries which usually suffer from insufficient word sense entries. Key wordsword sense disambiguation; sense tagging; ancient Chinese; natural language processing
[1] Pradhan, S., Loper, E., Dligach, D., et al. Semeval-2007 task-17: English lexical sample srl and all words[C]// Proceedings of SemEval-2007, ACL, 2007, 87-92. [2] 汉语大词典2.0[CD]. 商务印书馆(香港). 2005. [3] 董志翘.为中古汉语研究夯实基础[J].燕山大学学报(哲学社会科学版),2011,12(1): 1-6. [4] 于丽丽,丁德鑫,曲维光,等. 基于条件随机场的古汉语词义消歧研究[J].微电子学与计算机,2009,10: 45-48. [5] Lesk. M. Automatic sense disambiguation using machine readable dictionaries: how to tell a pinecone from an ice cream cone[C]// Proceedings of the 5th annual international conference on Systems documentation, 1986:24-26. [6] Patwardhan, S., Banerjee, S., Pedersen, T. Using measures of Semantic Relatedness for Word Sense Disambiguation[C]// Proceedings of CICLing, 2003:241-257. [7] Pedersen, T., Banerjee, S., Patwardhan, S. Maximizing semantic relatedness to perform word sense disambiguation[R]. Minneaplis: University of Minnesota Supercomputing Institute, Res. rep: UMSI 2005/25, 2005. [8] Sinha, R., Mihalcea, R. Unsupervised graph-based word sense disambiguation using measures of word semantic similarity[C]// Proceedings of the IEEE International Conference on Semantic Computing, 2007:363-369. [9] Agirre E., Soroa A. Personalizing PageRank for word sense disambiguation[C]// Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, 2009:33-41. [10] Yarowsky D. Unsupervised Word-Sense Disambiguation Rival Supervised Methods[C]// Proceeding of the 33rd Annual Meeting of the Association for Computational Linguistics, 1995:189-196. [11] Jin P. Li F., Zhu D., et al. Exploiting External Knowledge Sources to Improve Kernel-based Word Sense Disambiguation[C]// Proceedings of IEEE International Conference on Natural Language Processing and Knowledge Engineering, 2008:222-227. [12] 石民,李斌,陈小荷. 基于CRF的先秦汉语分词标注一体化研究[J],中文信息学报,2010,2: 39-45. [13] Manning C, Raghavan P, Schütze H. An introduction to Information Retrieval[M]. Cambridge, England: Cambridge University Press, 2007: 210-211.