针对维吾尔语名词短语待消解项识别任务,该文提出一种利用栈式非负约束自编码器(Stacked Nonnegative Constrained Autoencoder,SNCAE)完成基于语义特征的待消解项识别方法。为了提高自动编码器隐藏层激活度的稀疏性和重构数据的质量,利用NCAE非负约束算法,为连接权值施加非负性约束。通过分析维吾尔语名词短语语言指代现象,提取出15个特征,利用SNCAE提取出深层语义特征,引入Softmax分类器,进而完成待消解项识别任务。该方法在维吾尔语名词短语待消解项识别中,正例准确率和负例准确率分别比SVM高出8.259%和4.158%,比栈式自编码(SAE)高出1.884%和1.590%,表明基于SNCAE的维吾尔语名词短语待消解项识别方法比SVM和SAE更适合维吾尔文的待消解项识别任务。
Abstract
Focusedon Uyghur noun phrase coreference identification task, this paper proposed a Stacked Nonnegative Constrained Autoencoder( SNCAE) for anaphoricity determination based on semantic feature. Through the analysis of Uyghur noun phrase language phenomenon, 15 kinds of semantic features are extracted, and then input into SNCAE to extract the deep semantic features. Finally, the Softmax classifier is used to complete the recognition task. Compared with Support Vector Machine (SVM), the positive accuracy and negative accurate increased by 8.259% and 4.158%, respectively, and increased by 1.884% and 1.590%, respectively, than the Stacked Autoencoder (SAE).
关键词
待消解项识别 /
维吾尔语 /
非负约束算法 /
栈式自编码 /
支持向量机
{{custom_keyword}} /
Key words
anaphoricity determination /
Uyghur /
NCAE /
SAE /
SVM
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Soon W M, Ng H T, Lim D. A machine learning approach to coreference resolution of noun phrase [J]. Computational Linguistics, 2001, 27(4):521-544.
[2] 钱伟, 郭以昆, 周雅倩, 等. 基于最大熵模型的英文名词短语指代消解[J]. 计算机研究与发展, 2003, 40(9):1337-1343.
[3] 周俊生, 黄书剑, 陈家骏, 等. 一种基于图划分的无监督汉语指代消解算法[J]. 中文信息学报, 2007, 21(2):77-82.
[4] 孔芳, 周国栋. 基于树核函数的中英文代词消解[J]. 软件学报, 2012, 23(5):1085-1099.
[5] 奚雪峰, 周国栋. 基于Deep Learning的代词指代消解[J]. 北京大学学报(自然科学版), 2014, 50(1):100-110.
[6] Bergsma S, Lin D. Bootstrapping path-based pronoun resolution[C]//Proceedings of the 21st International Conference on Computational Linguistics and the 4th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2006:33-40.
[7] Lappin S, Herbert J L. Analgorithm for Pronominal anaphora resolution [J]. Computational Linguistics, 1994, 20(4);535-561.
[8] Ng V, Cardie C. Improving machine learning approaches to coreference resolution [C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL). Philadelphia:Association for Computational Linguistics, 2002:104-111.
[9] Zhou G D, Kong F. Global learning of noun phrase anaphoricity in coreference resolution via label propagetion[C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA:Association for Computational Linguistics, 2009:978-986.
[10] 孔芳, 朱巧明, 周国栋. 中英文指代消解中待消解项识别的研究[J]. 计算机研究与发展, 2012, 49(5):1072-1085.
[11] 张 超, 孔 芳, 周国栋. 交互式问答系统中待消解项的识别方法研究. 中文信息学报, 2014, 28(4):111-116.
[12] Bengio Y, Delalleau O. On the expressive power of deep architectures[C]//Proceedings of the 14th International Conference on Discovery Science. Berlin:Springer-Verlag, 2011:18-36.
[13] Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks[C]//Proceedings of 14th International Conference on Artificial Intelligence and Statistics. Fort Lauderdale, USA :AISTATS, 2011:315-323.
[14] Salakhutdinov R, Hinton G. Semantic hashing[J]. International Journal of Approximate Reasoning, 2009, 50(7):969-978.
[15] Zhang K X, Zhou C L. Unsupervised feature learning for Chinese lexicon based on auto-encoder[J]. Journal of Chinese Information Processing, 2013, 27(5):85-92.
[16] 张开旭, 周昌乐. 基于自动编码器的中文词汇特征无监督学习[J]. 中文信息学报, 2013, 27(5):85-92.
[17] 刘勘, 袁蕴英. 基于自动编码器的短文本特征提取及聚类研究[J]. 北京大学学报(自然科学版), 2015, 51(2):282-288.
[18] G E Hinton, S Osindero, Y W Teh. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(1):1527-1554.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61563051, 61662074);国家自然科学基金(61262064);国家自然科学基金(61331011);自治区科技人才培养项目(QN2016YX0051)
{{custom_fund}}