基于上下文词向量和主题模型的实体消歧方法

王瑞,李弼程,杜文倩

PDF(5429 KB)
PDF(5429 KB)
中文信息学报 ›› 2019, Vol. 33 ›› Issue (11) : 46-56.
语言分析与计算

基于上下文词向量和主题模型的实体消歧方法

  • 王瑞,李弼程,杜文倩
作者信息 +

Entity Disambiguation Based on Context Word Vector and Topic Models

  • WANG Rui, LI Bicheng, DU Wenqian
Author information +
History +

摘要

传统词向量训练模型仅考虑词共现而未考虑词序,语义表达能力弱。此外,现有实体消歧方法没有考虑实体的局部特征。综合实体的全局特征和局部特征,该文提出一种基于上下文词向量和主题模型的实体消歧方法。首先,在传统词向量模型上增加上下文方向向量,用于表征语序,并利用该模型与主题模型训练主题词向量;其次,分别计算实体上下文相似度、基于实体上下文主题的类别主题相似度以及基于主题词向量的实体主题相似度;最后,融合三种相似度,选择相似度最高的实体作为最终消歧实体。实验结果表明,相比于现有的主流消歧方法,新方法是有效的。

Abstract

To employ both the global and the local features of the entity, an entity disambiguation method based on context word vector and topic model is proposed. Firstly, the context direction vector is added to the traditional word vector model to represent the word order, and the model is used to train the topic vector based on topic model. Secondly, the entity context similarity, the category topic similarity based on the entity topic and the entity theme similarity based on the topic vector are calculated, respectively. Finally, the three similarities are merged, and the entity with the highest similarity is taken as the target entity. The experimental results show that the new method is effective compared to state-of-the-art methods.

关键词

上下文词向量 / 实体消歧 / 知识库 / 主题词向量 / 主题模型

Key words

context word vector / entity disambiguation / knowledge base / topic vector / topic models

引用本文

导出引用
王瑞,李弼程,杜文倩. 基于上下文词向量和主题模型的实体消歧方法. 中文信息学报. 2019, 33(11): 46-56
WANG Rui, LI Bicheng, DU Wenqian. Entity Disambiguation Based on Context Word Vector and Topic Models. Journal of Chinese Information Processing. 2019, 33(11): 46-56

参考文献

[1] Milne D,Witten I H. Learning to link with wikipedia[C]//Proceedings of the 17th ACM Conference on Information and Knowledge Management. ACM,2008: 509-518.
[2] Han X,Sun L,Zhao J. Collective entity linking in web text: A graph-based method[C]//Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM,2011: 765-774.
[3] Ji H,Grishman R,Dang H T,et al. Overview of the TAC 2010 knowledge base population track[C]//Proceedings of the 3rd Text Analysis Conference (TAC 2010). 2010: 1-25.
[4] Shen W,Wang J,Luo P,et al. Linden: Linking named entities with knowledge base via semantic knowledge[C]//Proceedings of the 21st International Conference on World Wide Web. ACM,2012: 449-458.
[5] Agichtein E,Gravano L. Snowball: Extracting relations from large plain-text collections[C]//Proceedings of the 5th ACM Conference on Digital Libraries. ACM,2000: 85-94.
[6] O Etzioni,A Fader,J Christensen,et al.Open Information extraction: The second generation[C]//Proceedings of the 22nd International Joint Conference on Artificial Intelligence 19—Volume Volume One. Barcelona,Catalonia,Spain: AAAI Press,2011: 3-10.
[7] Shi B,Zhang Z,Sun L,et al. A probabilistic co-bootstrapping method for entity set expansion[C]//Proceedings of COLING 2014,the 25th International Conference on Computational Linguistics: Technical Papers,2014: 2280-2290.
[8] Francis-Landau M,Durrett G,Klein D. Capturing semantic similarity for entity linking with convolutional neural networks[C]//Proceedings of Human Language Technologies.Stroudsburg,USA: ACL,2016: 1256-1261.
[9] Ganea O E,Hofmann T. Deep joint entity disambiguation with local neural attention[J].arXiv preprint arXiv: 1704.04920,2017.
[10] Sil A,Kundu G,Florian R,et al. Neural cross-lingual entity linking[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence,2018:5464-5472.
[11] 毛二松,王波,唐永旺,等. 基于词向量的中文微博实体链接方法[J]. 计算机应用与软件,2017,34(4): 11-15.
[12] 怀宝兴,宝腾飞,祝恒书,等. 一种基于概率主题模型的命名实体链接方法[J]. 软件学报,2014,25(9): 2076-2087.
[13] 冯冲,石戈,郭宇航,等. 基于词向量语义分类的微博实体链接方法[J]. 自动化学报,2016,42(6): 915-922.
[14] Mikolov T,Chen K,Corrado G,et al.Efficient estimation of word representations in vector space[J].arXiv preprint arXiv: 1301.3781,2013.
[15] Song Y,Shi S,Li J,et al. Directional Skip-Gram: Explicitly distinguishing left and right context for word embeddings[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics.Human Language Technologies,2018: 175-180.
[16] Niu L,Dai X,Zhang J,et al.Topic2Vec: Learning distributed representations of topics[C]//Proceedings of 2015 International Conference on Asian Language Processing (IALP). IEEE,2015: 193-196.
[17] 曾琦,周刚,兰明敬,等. 一种多义词词向量计算方法[J]. 小型微型计算机系统,2016,37(7): 1417-1421.
[18] Ling W,Dyer C,Black A W,et al. Two/too simple adaptations of word2vec for syntax problems[C]//Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,2015: 1299-1304.
[19] Hachey B,Radford W,Nothman J,et al. Evaluating entity linking with Wikipedia[J]. Artificial Intelligence,2013,194: 130-150.
[20] 张涛,刘康,赵军. 一种基于图模型的维基概念相似度计算方法及其在实体链接系统中的应用[J]. 中文信息学报,2015,29(2): 58-67.
[21] 朱敏,贾真,左玲,等. 中文微博实体链接研究[J]. 北京大学学报 (自然科学版),2014,50(1): 73-78.
[22] 马晓军,郭剑毅,王红斌,等. 融合词向量和主题模型的领域实体消歧[J]. 模式识别与人工智能,2017,12: 1130-1137.

基金

福建省社会科学规划项目(FJ2017B073);华侨大学科研启动项目(600005-Z16Y0005)
PDF(5429 KB)

821

Accesses

0

Citation

Detail

段落导航
相关文章

/