基于语义角色标注的汉语句子相似度算法

田 堃;柯永红;穗志方

PDF(1710 KB)
PDF(1710 KB)
中文信息学报 ›› 2016, Vol. 30 ›› Issue (6) : 126-132.
综述

基于语义角色标注的汉语句子相似度算法

  • 田 堃;柯永红;穗志方
作者信息 +

Chinese Sentence Similarity Computing Based on Semantic Roles Annotation

  • TIAN Kun; KE Yonghong; SUI Zhifang
Author information +
History +

摘要

在语义角色标注过程中,经常需要检索相似的已标注语料,以便进行参考和分析。现有方法未能充分利用动词及其支配的成分信息,无法满足语义角色标注的相似句检索需求。基于此,本文提出一种新的汉语句子相似度计算方法。该方法基于已标注好语义角色的语料资源,以动词为分析核心,通过语义角色分析、标注句型的相似匹配、标注句型间相似度计算等步骤来实现句子语义的相似度量。为达到更好的实验效果,论文还综合比较了基于知网、词向量等多种计算词语相似度的算法,通过分析与实验对比,将实验效果最好的算法应用到句子相似度计算的研究中。实验结果显示,基于语义角色标注的句子相似度计算方法相对传统方法获得了更好的测试结果。

Abstract

In the process of semantic roles annotation, searching for similar annotated sentences is a common way to analyze such corpus. Existing methods cannot take full advantage of verbs and related elements, so they are unable to meet the demand of searching for similar annotated sentences. This article develops a new method to calculate Chinese sentence similarity focused on the verbs. Based on semantic roles annotation, the algorithm detects the similar sentences by analyzing the semantic roles, matching the annotated sentences, and calculating similarity between these matched sentences. To get a better result, the article also compares several other methods for word similarity, including algorithms based on How-net and Distributed Representation, and applies the best one into our algorithm. The experimental result indicates that the sentence similarity algorithm based semantic roles annotation performs better than traditional methods.

关键词

语义角色标注 / 词语相似度 / 知网 / 词向量 / 标注句型匹配

Key words

semantic roles annotation / word similarity / How-net / word vector / annotated sentence match
 
/   /   /
 
/   /   /
 
/   /  

引用本文

导出引用
田 堃;柯永红;穗志方. 基于语义角色标注的汉语句子相似度算法. 中文信息学报. 2016, 30(6): 126-132
TIAN Kun; KE Yonghong; SUI Zhifang. Chinese Sentence Similarity Computing Based on Semantic Roles Annotation. Journal of Chinese Information Processing. 2016, 30(6): 126-132

参考文献

[1] 秦兵, 刘挺 等. 基于常问问题集的中文问答系统研究[J]. 哈尔滨工业大学学报, 2003, 35(10): 1179-1182.
[2] Li S.J., et al. Semantic computation in a Chinese question-answering system. Journal of Computer Science and Technology, 2002, 17 (6): 933-939.
[3] 穗志方, 俞士汶. 基于骨架依存树的语句相似度计算模型[A]. 中文信息处理国际会议(ICCIP\′98)[C]. 北京: 清华大学出版社, 1998, 458-465.
[4] 李彬 等. 基于语义依存的汉语句子相似度计算[J]. 计算机应用研究, 2003, 20(12): 15-17.
[5] 车万翔 等. 基于改进编辑距离的中文相似句子检索[J]. 高技术通讯, 2004, 14(7): 15-20.
[6] E.Ristad and P. Yianilos, Learning String Edit Distance. IEEE Trans. PAMI, 1998, 20(5): 522-523.
[7] 晋耀红 等.基于语境框架的文本相似度计算[J]. 计算机工程与应用, 2004, 40(16): 36-39.
[8] 潘谦红,史忠植 等.基于属性论的文本相似度计算[J]. 计算机学报, 1999, 22(6): 651-655.
[9] Chatterjee N. A Statistical approach for similarity measurement between sentences for EBMT. 1999.
[10] 田久乐, 赵蔚. 基于同义词词林的词语相似度计算方法[J]. 吉林大学学报(信息科学版), 2010, 28(6): 602-608.
[11] 刘群, 李素建. 基于《知网》的词汇语义相似度计算[C]//第三届汉语词汇语义学研讨会, 台北, 2002.
[12] 维基百科Word2vec词条页面[OL]. https://zh.wikipedia.org/wiki/Word2vec.
[13] LDC(Linguistic Data Consortium)主页[OL]. https://www.ldc.upenn.edu/language-resources/data.
[14] 维基百科Skip-gram词条页面[OL]. https://en.wikipedia.org/wiki/N-gram#Skip-gram.
[15] Xue, Nianwen and Martha Palmer, 2009, Adding semantic roles to the Chinese Treebank[J]. Natural Language Engineering, 2008, 15(1): 143-172.
[16] 秦兵, 刘挺等. 基于常问问题集的中文问答系统研究[J]. 哈尔滨工业大学学报, 2003(10): 1179-1182.
[17] Ji Wenqian, Li Zhoujun, Chao Wenhan, et al. A new method for calculating similarity between sentences and application on automatic abstracting[J]. Intelligent Information Management, 2009, 1(1): 38-45.
[18] Ru Li,Zhiqiang Wang,Shuanghong Li,Jiye Liang,Collin Baker, Chinese sentence similarity computing based on frame semantic parsing[J]. Journal of Computer Research and Development, 2013, 50(8): 1728-1736.

基金

国家“973”计划(2014CB340504)
PDF(1710 KB)

Accesses

Citation

Detail

段落导航
相关文章

/