指代消解是自然语言处理技术的核心问题,该文结合维吾尔语语义特征,提出基于深度学习的维吾尔语人称代词指代消解方法。通过堆叠多层无监督RBM网络和一层有监督BP网络,构建DBN深度神经网络学习模型,RBM网络保证特征向量映射达到最优,BP网络对RBM网络的输出向量进行分类,实现维吾尔语人称代词指代消解。经过维吾尔语指代消解语料库测试, F值达到83.81%,比SVM方法高出2.88%。实验结果表明,同等条件下,该方法能有效提升维吾尔语人称代词消解的精度,有助于维吾尔语指代消解研究。
Abstract
Coreference resolution is a fundamental issue in natural language processing. Combining the semantic features of Uyghur, a method of Uyghur pronominal anaphora resolution based on Deep Learning is proposed. The proposed DBN (Deep Belief Nets) learning model is composed of several unsupervised RBM networks and a supervised BP network. The RBM layers preserve information as much as possible when feature vectors are mapped to next layer. The BP layer is able to classify the vector output by the last RBM layer. Then the model can be used to implement Uyghur pronominal anaphora resolution. Experiments on Uyghur coreference resolution corpus achieve 83.81% in F-score, 2.88% higher than SVM.
关键词
维吾尔语 /
人称代词 /
指代消解 /
深度学习 /
深度信念网络
{{custom_keyword}} /
Key words
Uyghur /
personal pronoun /
anaphora resolution /
deep learning /
deep belief network
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 奚雪峰, 周国栋.基于Deep Learning的代词指代消解[J].北京大学学报(自然科学版), 2014, 50(1): 100-110.
[2] McCarthy J,Lehnert W. Using decision trees for coreference resolution[C]//Proceedings of the Fourteenth International Conference on Artificial Intelligence. Montreal, 1995: 1050-1055.
[3] Soon W M, Ng H T, Lim C Y. A machine learning approach to coreference resolution of noun phrases[J]. Computational Linguistics, 2001, 27(4): 521-544.
[4] 孔芳, 周国栋.基于树核函数的中英文代词消解[J].软件学报, 2012, 23(5): 1085-1099.
[5] Ng V, Cardie C. Improving machine learning approaches to coreference resolution[C]//Proceedings of the ACL 2002.2002: 104-111.[doi: 10.3115/ 1073083.1073102]
[6] Yang XF, Su J, Tan CL. A twin-candidate model for learning-based anaphora resolution[J]. Computational Linguistics,2008,34(3): 327-356. [doi: 10.1162/coli.2008.07-004-R2-06-57]
[7] Kong F, Zhou G D, Zhu Q. Employing the centering theory in pronoun resolution from the semantic perspective[C]//Proceedings of the ENNLP 2009.
[8] 许敏, 王能忠, 马彦华. 汉语中指代问题的研究及讨论[J]. 西南师范大学学报(自然科学版), 1999(6): 633-637.
[9] 王厚峰, 何婷婷. 汉语中人称代词的消解研究[J]. 计算机学报, 2001, 24(2): 136-143.
[10] 王厚峰,梅铮.鲁棒性的汉语人称代词消解[J]. 软件学报, 2005, 16(05): 700-707.
[11] 李国臣, 罗云飞.采用优先选择策略的中文人称代词的指代消解[J]. 中文信息学报, 2005, 19(04): 24-30.
[12] 李凡, 刘启和, 李洪伟.基于Fuzzy Rough集模型的汉语人称代词消解[J]. 计算机科学, 2010, 37(01): 245-250.
[13] 孙志军, 薛磊, 许阳明, 等.深度学习研究综述[J]. 计算机应用研究, 2012, 29(08): 2806-2810.
[14] 孙茂松, 刘挺, 姬东鸿, 等.语言计算的重要国际前沿[J]. 中文信息学报, 2014, 28(01): 01-08.
[15] Mohamed A, Sainath T N, Dahl G, et al. Deep belief Networks using discriminative features for phone recognition[C]//Proceedings of the 19th IEEE International Conference on Acoustics.2011: 5060-5063.
[16] Nair V, Hinton G E.3D object recognition with deep belief nets[C]//Proceedings of A Meeting Held 7-10 December 2009, Vancouver, British Columbia, Canada.2012: 1527-1554.
[17] Seide F, Li G, Yu D. Conversational speech transcription using context-dependent deep neural networks[C] //Proceedings of the 12th International Conference on Spoken Language Processing(INTERSPEECH), 2011: 437-440.
[18] Collobert R, Weston J, Bottou L, et al. Natural language processing(almost)from scratch [J]. Journal of Machine Learning Research, 2011(12): 2493-2537.
[19] 段祥超, 禹龙, 田生伟,等. 维吾尔语意见挖掘关系抽取研究[J].计算机工程与设计, 2013,34(9): 3260-3265.
[20] 杨勇, 李艳翠, 周国栋, 等.指代消解中距离特征的研究[J].中文信息学报, 2008,22(05): 39-44.
[21] Hinton G E, Osindero S, Teh Y W. A Fast learning algorithm for deep belief nets[J]. Neural Computer, 2006, 18(7): 1527-1554.
[22] 董国志, 朱玉全, 程显毅.中文人称代词指代消解的研究[J].计算机应用研究, 2011,28(05): 1774-1779.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61563051,61662074);国家自然科学基金(61262064);国家自然科学基金(61331011);新疆自治区科技人才培养项目(QN2016YX0051)
{{custom_fund}}