维吾尔语名词短语待消解项识别

陶豆豆,禹龙,田生伟,赵建国,吐尔根·依布拉音,艾斯卡尔·艾木都拉

PDF(1851 KB)
PDF(1851 KB)
中文信息学报 ›› 2017, Vol. 31 ›› Issue (5) : 92-98,113.
民族语言与周边语言信息处理

维吾尔语名词短语待消解项识别

  • 陶豆豆1,禹龙2,田生伟1,赵建国3,吐尔根·依布拉音4,艾斯卡尔·艾木都拉1
作者信息 +

Anaphoricity Determination of Uyghur Noun Phrases

  • TAO Doudou1, YU Long2, TIAN Shengwei1, ZHAO Jianguo3, Turgun·Ibrahim4 , Askar·Hamdulla1
Author information +
History +

摘要

针对维吾尔语名词短语待消解项识别任务,该文提出一种利用栈式非负约束自编码器(Stacked Nonnegative Constrained Autoencoder,SNCAE)完成基于语义特征的待消解项识别方法。为了提高自动编码器隐藏层激活度的稀疏性和重构数据的质量,利用NCAE非负约束算法,为连接权值施加非负性约束。通过分析维吾尔语名词短语语言指代现象,提取出15个特征,利用SNCAE提取出深层语义特征,引入Softmax分类器,进而完成待消解项识别任务。该方法在维吾尔语名词短语待消解项识别中,正例准确率和负例准确率分别比SVM高出8.259%和4.158%,比栈式自编码(SAE)高出1.884%和1.590%,表明基于SNCAE的维吾尔语名词短语待消解项识别方法比SVM和SAE更适合维吾尔文的待消解项识别任务。

Abstract

Focusedon Uyghur noun phrase coreference identification task, this paper proposed a Stacked Nonnegative Constrained Autoencoder( SNCAE) for anaphoricity determination based on semantic feature. Through the analysis of Uyghur noun phrase language phenomenon, 15 kinds of semantic features are extracted, and then input into SNCAE to extract the deep semantic features. Finally, the Softmax classifier is used to complete the recognition task. Compared with Support Vector Machine (SVM), the positive accuracy and negative accurate increased by 8.259% and 4.158%, respectively, and increased by 1.884% and 1.590%, respectively, than the Stacked Autoencoder (SAE).

关键词

待消解项识别 / 维吾尔语 / 非负约束算法 / 栈式自编码 / 支持向量机

Key words

anaphoricity determination / Uyghur / NCAE / SAE / SVM

引用本文

导出引用
陶豆豆,禹龙,田生伟,赵建国,吐尔根·依布拉音,艾斯卡尔·艾木都拉. 维吾尔语名词短语待消解项识别. 中文信息学报. 2017, 31(5): 92-98,113
TAO Doudou, YU Long, TIAN Shengwei, ZHAO Jianguo, Turgun·Ibrahim , Askar·Hamdulla. Anaphoricity Determination of Uyghur Noun Phrases. Journal of Chinese Information Processing. 2017, 31(5): 92-98,113

参考文献

[1] Soon W M, Ng H T, Lim D. A machine learning approach to coreference resolution of noun phrase [J]. Computational Linguistics, 2001, 27(4):521-544.
[2] 钱伟, 郭以昆, 周雅倩, 等. 基于最大熵模型的英文名词短语指代消解[J]. 计算机研究与发展, 2003, 40(9):1337-1343.
[3] 周俊生, 黄书剑, 陈家骏, 等. 一种基于图划分的无监督汉语指代消解算法[J]. 中文信息学报, 2007, 21(2):77-82.
[4] 孔芳, 周国栋. 基于树核函数的中英文代词消解[J]. 软件学报, 2012, 23(5):1085-1099.
[5] 奚雪峰, 周国栋. 基于Deep Learning的代词指代消解[J]. 北京大学学报(自然科学版), 2014, 50(1):100-110.
[6] Bergsma S, Lin D. Bootstrapping path-based pronoun resolution[C]//Proceedings of the 21st International Conference on Computational Linguistics and the 4th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2006:33-40.
[7] Lappin S, Herbert J L. Analgorithm for Pronominal anaphora resolution [J]. Computational Linguistics, 1994, 20(4);535-561.
[8] Ng V, Cardie C. Improving machine learning approaches to coreference resolution [C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL). Philadelphia:Association for Computational Linguistics, 2002:104-111.
[9] Zhou G D, Kong F. Global learning of noun phrase anaphoricity in coreference resolution via label propagetion[C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA:Association for Computational Linguistics, 2009:978-986.
[10] 孔芳, 朱巧明, 周国栋. 中英文指代消解中待消解项识别的研究[J]. 计算机研究与发展, 2012, 49(5):1072-1085.
[11] 张 超, 孔 芳, 周国栋. 交互式问答系统中待消解项的识别方法研究. 中文信息学报, 2014, 28(4):111-116.
[12] Bengio Y, Delalleau O. On the expressive power of deep architectures[C]//Proceedings of the 14th International Conference on Discovery Science. Berlin:Springer-Verlag, 2011:18-36.
[13] Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks[C]//Proceedings of 14th International Conference on Artificial Intelligence and Statistics. Fort Lauderdale, USA :AISTATS, 2011:315-323.
[14] Salakhutdinov R, Hinton G. Semantic hashing[J]. International Journal of Approximate Reasoning, 2009, 50(7):969-978.
[15] Zhang K X, Zhou C L. Unsupervised feature learning for Chinese lexicon based on auto-encoder[J]. Journal of Chinese Information Processing, 2013, 27(5):85-92.
[16] 张开旭, 周昌乐. 基于自动编码器的中文词汇特征无监督学习[J]. 中文信息学报, 2013, 27(5):85-92.
[17] 刘勘, 袁蕴英. 基于自动编码器的短文本特征提取及聚类研究[J]. 北京大学学报(自然科学版), 2015, 51(2):282-288.
[18] G E Hinton, S Osindero, Y W Teh. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(1):1527-1554.

基金

国家自然科学基金(61563051, 61662074);国家自然科学基金(61262064);国家自然科学基金(61331011);自治区科技人才培养项目(QN2016YX0051)
PDF(1851 KB)

689

Accesses

0

Citation

Detail

段落导航
相关文章

/