Chinese Message Structures Disambiguation Based on HowNet
ZHANG Ruixia1, ZHUANG Jinlin1, YANG Guozeng2
1.Department of Information Engineering, North China University of Water Conservancy and Electric Power, Zhengzhou,Henan 450011, China; 2.Department of Mathematics, Zhengzhou Teachers College, Zhengzhou,Henan 450044, China
Abstract:The Chinese Message Structure Database, as an important component in HowNet, can be treated as a rule base for Chinese semantic analysis. The disambiguation of Chinese message structures is the first step in bring the base into practical application. In this paper, the Chinese message structures are firstly formalized and then divided into different priority levels. Afterwards,, four diverse disambiguation approaches are proposed, including the syntax list judgment, the graph compatibility matching, the graph compatibility computation and the semantic similarity computation based on examples. Finally, different disambiguation processes are designed according to the different priority levels. Experimental results prove the accuracy rate of the disambiguation yields more than 90%. Key wordsHowNet; Chinese message structure; disambiguation; graph compatibility; semantic similarity
[1] 董振东,董强. 《知网》——《知网》简介[R].http://www.keenage.com. [2] B.Pang, L.Lee, S.Vaithyanathan. Thumbs up? Sentiment classification using machine learning techniques[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2002:79-86 [3] D.Turney Peter, L.Littman Michael. Measuring praise and criticism: inference of semantic orientation from association[J], ACM Transactions on Information Systems, 2003,21(4): 315-346. [4] B.Pang, L.Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales[C]//Proceedings of the Association for Computational Linguistics (ACL), 2005:115-124. [5] A.M.Popescu, O.Etzioni. Extracting product features and opinions from reviews[C]//Proceedings of the Human Language Technology Conference and the Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP).2005. [6] X.Ding, B.Liu, P.S.Yu. A holistic lexicon-based approach to opinion mining[C]//Proceedings of the Conference on Web Search and Web Data Mining (WSDM).2008. [7] M.Hu, B.Liu. Mining and summarizing customer reviews[C]//Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2004:168-177. [8] 张文修,吴伟志,梁吉业,等. 粗糙集理论与方法[M]. 北京:科学出版社. 2001:206-213. [9] 冯淑芳,王素格. 面向观点挖掘的汽车本体知识库的构建[J]. 计算机应用与软件, 2011,28(5):45-47. [10] 王素格,杨安娜,李德玉. 基于汉语情感词表的句子情感倾向分类研究[J]. 计算机工程与应用,2009,45(24):153-155,161 [11] L.Polanyi, A.Zaenen. Contextual lexical valence shifters[C]//Proceedings of the AAAI Spring Symposium on Exploring Attitude and Affect in Text. 2004. [12] 王加阳,高灿. 改进的基于差别矩阵的属性约简算法[J]. 计算机工程,2009,35(3): 66-67, 73. [13] 刘远超,王晓龙,徐志明,等. 文档聚类综述[J]. 中文信息学报, 2006,20(3):55-62.