由于跨境民族相关的文化实体常出现相同实体具有不同名称表达的情况,使用当前主流的文本检索方法在跨境民族文化数据集上将面临语义稀疏的问题。该文提出一种基于实体语义扩展的跨境民族文化检索方法,利用跨境民族文化知识图谱,以知识三元组的形式将跨境民族文化之间的实体关联起来,并添加实体类别标签,以此缓解跨境民族文化实体中语义信息不充分的问题。通过TransH模型对实体及扩展语义信息进行向量化表示,融合到查询文本中进行语义增强,以此提升跨境民族文化文本检索的准确性。实验结果表明,该方法比基线模型提高了5.4%。
Abstract
To deal with the semantic sparsity caused by same entities in different forms in the culture of cross-border ethnic groups, this paper proposes a cross-border ethnic culture retrieval method based on entity semantic expansion. It uses the cross-border ethnic cultural knowledge map to associate the entities between various culture texts in the form of knowledge triples with addtional entity category tags. The TransH model is applied to represent entities and their extended semantic information, which is integrated into the query as kind of semantic enhancement. Experimental results show that the proposed method is 5.4% higher than the baseline model.
关键词
文本检索 /
跨境民族文化 /
知识图谱 /
实体语义扩展
{{custom_keyword}} /
Key words
text retrieval /
cross border national culture /
knowledge graph /
entity semantic extension
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] DAI Z, XIONG C, CALLAN J, et al. Convolutional neural networks for soft-matching n-grams in ad-hoc search[C]//Proceedings of the 11th ACM International Conference on Web Search and Data Mining. New York:ACM,2018:126-134.
[2] LU S, DOU Z, XIONG C, et al. Knowledge enhanced personalized search[C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, New York:ACM, 2020:709-718.
[3] ZHOU S, DAI X, CHEN H, et al. Interactive recommender system via knowledge graph-enhanced reinforcement learning[C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, New York:ACM, 2020:179-188.
[4] WANG Z, ZHANG J, FENG J, et al. Knowledgegraph embedding by translating on hyperplanes[C]//Proceedings of the 28th AAAI Conference on Artificial Intelligence. Menlo Park:AAAI,2014: 1112-1119.
[5] CUTTING D, PEDERSEN J. Optimization for dynamic inverted index maintenance[C]//Proceedings of the 13th Anual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York: ACM 1989: 405-411.
[6] HUANG P, HE X, GAO J, et al. Learning deep structured semantic models for web search using clickthrough data[C]//Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, New York: ACM,2013: 2333-2338.
[7] YIN W, SCHUTZE H, XIANG B, et al. ABCNN: Attention-based convolutional neural network for modeling sentence pairs[J]. Transactions of the Association for Computational Linguistics, 2016, 4: 259-272.
[8] SHEN Y, HE X, GAO J, et al. Alatent semantic model with convolutional-pooling structure for information retrieval[C]//Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, New York: ACM,2014: 101-110.
[9] PALANGI H, DENG L, SHEN Y, et al. Deep sentence embedding using long short-term memory networks: analysis and application to information retrieval[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(4): 694-707.
[10] ELKAHKY A M, SONG Y, HE X. A multi-view deep learning approach for cross domain user modeling in recommendation systems[C]//Proceedings of the 24th International Conference on World Wide Web,New York: ACM,2015: 278-288.
[11] CHEN Q, ZHU X, LING Z, et al. Enhanced LSTM for natural language inference[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Stroudsburg,PA:ACL,2017:1657-1668.
[12] XIONG C, CALLAN J, LIU T Y. Word-entity duet representations for document ranking[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, New York: ACM,2017:763-772.
[13] PANG L, LAN Y, GUO J, et al. Text matching as image recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2016.
[14] GONG Y, LUO H, ZHANG J. Natural language inference over interaction space.[C]//Proceedings of the International Conference on Learning Represen tations, 2018.
[15] TAN C, WEI F, WANG W, et al. Multiway attention networks for modeling sentence pairs[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm: IJCAI,2018: 4411-4417.
[16] GUO J, FAN Y, AI Q, et al. A deep relevance matching model for ad-hoc retrieval[C]//Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, New York: ACM,2016:55-64.
[17] JIANG J Y, ZHANG M, LI C, et al. Semantic text matching for long-form documents[C]//The World Wide Web Conference. San Francisco:ACM, 2019: 795-806.
[18] LAI Y, FENG Y, YU X, et al. Lattice CNNs for matching based Chinese question answering[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Hawaii, USA: AAAI Press 2019, 33(01): 6634-6641.
[19] CUI Y, ZHOU F, WANG J, et al. Kernel pooling for convolutional neural networks[C]//IEEE Conference on Computer Vision and Pattern Recognition. New York:IEEE, 2017:2921-2930.
[20] SCHUTZE H, MANNING C D, RAGHAVAN P. Introduction to information retrieval[M]. Cambridge: Cambridge University Press, 2008: 234-265.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61732005;61866019,61761026,61972186);云南省应用基础研究计划重点项目(2019FA023);云南省中青年学术和技术带头人后备人才项目(2019HB006);云南省重大科技专项计划项目(202103AA080015,202002AD080001)
{{custom_fund}}