从无结构文本中抽取实体与实体之间的关系是自然语言处理领域的重要研究内容,同时也为构建知识图谱、问答系统等应用提供重要支撑。基于联合模型的实体关系抽取任务将实体识别和关系抽取同时进行,克服了传统实体关系抽取任务中先识别句子中的实体,然后再进行实体关系判断这两次任务中的错误累加。该文针对藏文语料匮乏、实体识别准确率不高等问题,提出了基于联合模型抽取藏文实体关系的方法。基于藏文实体关系抽取任务,提出以下方案: ①针对藏文分词准确率不高的问题,对藏文进行字级和词级两种方式进行预处理,并给出对比实验,结果表明采用字级处理方式较词级处理方式效果有所提高。②藏文是一种语法规则比较强的语言,名词、格助词等能明确指示句子各组块之间的语法和语义结构关系,因此该文将藏文的词性标注特征加入到藏文的字词向量中,实验结果证明了方法的有效性。③该文借鉴了联合模型处理的优势,提出基于联合模型处理方式,采用端到端的BiLSTM框架将藏文实体关系抽取任务转变为藏文序列标注的问题,实验结果表明,该文的方法较传统的基于藏文处理方式,如SVM算法和LR算法,准确率提高了30%~40%。
Abstract
Extracting the entities and the relationship between them from unstructured texts is a challenging issue. This paper applies the joint model in Tibetan to perform the entity identification and relation extraction at the same time. An end-to-end sequence labelling framework of BiLSTM is adopted, and the POS information is integrated to enhance the performance. It is also demonstrated that the character-level processing method is more effective in Tibetan than the word-level processing. The experimental results show that the method improves the accuracy by 30%~40%, compared the SVM and LR.
关键词
联合模型 /
藏文实体关系 /
词性标注
{{custom_keyword}} /
Key words
joint model /
Tibetan entity relation /
POS tagging
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Alexandre Passos,Vineet Kumar,Andrew McCallum. Lexicon infused phrase embeddings for named entity resolution[C]//Proceedings of International Conference on Computational Linguistics,2014: 78-86.
[2] Gang Luo,et al. Joint entity recognition and disambiguation[C]//Proceedings of Conference on Empirical Methods in Natural Language Processing,2015: 879-888.
[3] Jason PC Chiu,Eric Nichols. Named entity recognition with bidirectional lstm-cnns[C]//Processings of Transactions of the Association for Computational Linguistics,2015.
[4] Zhiheng Huang,Wei Xu,Kai Yu. Bidirectional lstm-crf models for sequence tagging.arXiv preprint arXiv:1508.01991,2015.
[5] Guillaume Lample,et al.Neural architectures for named entity recognition[C]//Proceedings of the NAACL international conference,2016.
[6] Bryan Rink,et al.Utd: Classifying semantic relations by combining lexical and semantic resources[C]//Proceedings of the 5th International Workshop on Semantic Evaluation,2010: 256-259.
[7] Nanda Kambhatla. Combining lexical,syntactic,and semantic features with maximum entropy models for extracting relations[C]//Proceedings of the 43th ACL International Conference,2004: 22.
[8] Kun Xu,et al. Semantic relation classificationvia convolutional neural networks with simple negative sampling[C]//Proceedings of the EMNLP,2015a.
[9] Suncong Zheng,et al.A neural network framework for relation extraction: Learning entity semantic and relation pattern[J].KnowledgeBased Systems,2016(114):12-23.
[10] Xiang Ren,et al.Cotype: Joint extraction of typed entities and relations with knowledge bases[C]//Proceedings of the 26th WWW International Conference,2017.
[11] Bishan Yang,Claire Cardie. Joint inference for fine-grained opinion extraction[C]//Proceedings of the 51rd Annual Meeting of the Association for Computational Linguistics,2013:1640-1649.
[12] Sameer Singh,et al.Joint inference of entities,relations,and coreference[C]//Proceedings of the 2013 Workshop on Automated Knowledge Base Construction.ACM,2013: 1-6.
[13] Makoto Miwa,Mohit Bansal. End-to-end relation extraction using lstms on sequences and tree structures[C]//Proceedings of the 54rd Annual Meeting of the Association for Computational Linguistics,2016.
[14] Suncong Zheng,et al.Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme[J].arXiv preprint,2017,arXiv:1706.05075.
[15] 金明,杨欢欢,单广荣.藏语命名实体识别研究[J].西北民族大学学报(自然科学版),2010(3):49-52.
[16] 罗智勇,宋柔,朱小杰.藏族人名汉译名识别研究[J].情报学报,2009(3):475-480.
[17] 华却才让,等.基于感知机模型藏文命名实体识别[J].计算机工程与应用,2014,50(15):172-176.
[18] 刘飞飞,王志娟.基于层次特征的藏文人名识别研究[J/OL].计算机应用研究,2018(09):1-7[2018-03-22].
[19] 龙从军,周学文.藏语名词语义关系研究[C].全国少数民族青年自然语言处理学术研讨会,2008.
[20] 马宁,等.面向互联网的藏文实体关系模板获取技术研究[J].中央民族大学学报(自然科学版),2015,24(01):35-39.
[21] 何鸿业,郑瑾,张祖平.基于词性结合的卷积神经网络文本情感分析[J/OL].计算机工程:1-7[2018-03-13].
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61501529,61331013);国家语委项目(YB125-139,ZDI125-36)
{{custom_fund}}