基于《知网》的中文信息结构消歧研究

张瑞霞1,庄晋林1,杨国增2

PDF(867 KB)
PDF(867 KB)
中文信息学报 ›› 2012, Vol. 26 ›› Issue (4) : 43-50.
综述

基于《知网》的中文信息结构消歧研究

  • 张瑞霞1,庄晋林1,杨国增2
作者信息 +

Chinese Message Structures Disambiguation Based on HowNet

  • ZHANG Ruixia1, ZHUANG Jinlin1, YANG Guozeng2
Author information +
History +

摘要

《中文信息结构库》是《知网》的重要组成部分之一,可以作为中文语义分析的规则库,对其进行消歧是实际应用的基础之一。因此,该文首先对中文信息结构进行了形式化描述;接着对其进行优先级划分;然后根据其构成形式提出了四种不同的消歧方法 即词性序列消歧法、图相容匹配消歧法、图相容度计算消歧法、基于实例的语义相似度计算消歧法;最后针对不同优先级的中文信息结构集设计了不同消歧流程。实验结果证明消歧正确率达到了90% 以上。

Abstract

The Chinese Message Structure Database, as an important component in HowNet, can be treated as a rule base for Chinese semantic analysis. The disambiguation of Chinese message structures is the first step in bring the base into practical application. In this paper, the Chinese message structures are firstly formalized and then divided into different priority levels. Afterwards,, four diverse disambiguation approaches are proposed, including the syntax list judgment, the graph compatibility matching, the graph compatibility computation and the semantic similarity computation based on examples. Finally, different disambiguation processes are designed according to the different priority levels. Experimental results prove the accuracy rate of the disambiguation yields more than 90%.
Key wordsHowNet; Chinese message structure; disambiguation; graph compatibility; semantic similarity

关键词

知网 / 中文信息结构 / 消歧 / 图相容度 / 语义相似度

Key words

HowNet / Chinese message structure / disambiguation / graph compatibility / semantic similarity

引用本文

导出引用
张瑞霞1,庄晋林1,杨国增2. 基于《知网》的中文信息结构消歧研究. 中文信息学报. 2012, 26(4): 43-50
ZHANG Ruixia1, ZHUANG Jinlin1, YANG Guozeng2. Chinese Message Structures Disambiguation Based on HowNet. Journal of Chinese Information Processing. 2012, 26(4): 43-50

参考文献

[1] 董振东,董强. 《知网》——《知网》简介[R].http://www.keenage.com.
[2] B.Pang, L.Lee, S.Vaithyanathan. Thumbs up? Sentiment classification using machine learning techniques[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2002:79-86
[3] D.Turney Peter, L.Littman Michael. Measuring praise and criticism: inference of semantic orientation from association[J], ACM Transactions on Information Systems, 2003,21(4): 315-346.
[4] B.Pang, L.Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales[C]//Proceedings of the Association for Computational Linguistics (ACL), 2005:115-124.
[5] A.M.Popescu, O.Etzioni. Extracting product features and opinions from reviews[C]//Proceedings of the Human Language Technology Conference and the Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP).2005.
[6] X.Ding, B.Liu, P.S.Yu. A holistic lexicon-based approach to opinion mining[C]//Proceedings of the Conference on Web Search and Web Data Mining (WSDM).2008.
[7] M.Hu, B.Liu. Mining and summarizing customer reviews[C]//Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2004:168-177.
[8] 张文修,吴伟志,梁吉业,等. 粗糙集理论与方法[M]. 北京:科学出版社. 2001:206-213.
[9] 冯淑芳,王素格. 面向观点挖掘的汽车本体知识库的构建[J]. 计算机应用与软件, 2011,28(5):45-47.
[10] 王素格,杨安娜,李德玉. 基于汉语情感词表的句子情感倾向分类研究[J]. 计算机工程与应用,2009,45(24):153-155,161
[11] L.Polanyi, A.Zaenen. Contextual lexical valence shifters[C]//Proceedings of the AAAI Spring Symposium on Exploring Attitude and Affect in Text. 2004.
[12] 王加阳,高灿. 改进的基于差别矩阵的属性约简算法[J]. 计算机工程,2009,35(3): 66-67, 73.
[13] 刘远超,王晓龙,徐志明,等. 文档聚类综述[J]. 中文信息学报, 2006,20(3):55-62.

基金

河南省科技厅基础研究项目(082300410140)
PDF(867 KB)

522

Accesses

0

Citation

Detail

段落导航
相关文章

/