在电商网站评论文本中,评价对象和评价属性的缺省识别对文本情感分析具有重要地作用。针对电商网站评论文本中评价对象和评价属性缺省问题,该文提出了一种基于条件随机场的评价对象缺省项识别方法。首先利用情感词典识别观点句,将缺省项识别问题转换成序列标注问题,综合词法特征和依存句法特征,使用条件随机场模型进行训练,并在测试集上对待识别的观点句进行序列标注,通过标注结果判定缺省项的位置。实验结果表明,该方法具有较高的准确率和召回率,验证了该方法的有效性。
Abstract
The identification of the default objects and attributes in a comment is important in sentiment analysis for the commerce website’s reviews. To resolve the default comment objects and attributes, this paper proposes an effective identification method based on Conditional Random Fields (CRF). After applying an emotion dictionary to locate the opinion comments, we treat this task as a sequence labeling problem, and choose the lexical and dependency parsing elements as features. The evaluation results prove the proposed method with reasonable good accuracy and recall rates.
关键词
条件随机场 /
评价对象 /
缺省识别 /
序列标注
{{custom_keyword}} /
Key words
Conditional Random Fields(CRFs) /
comment object /
the default resolution /
sequence labeling
/
/
/
/
/
/
/
/
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 秦凯伟, 孔芳, 李培峰, 等. 基于规则的中文零指代项识别研究[J]. 计算机科学, 2012, 39(10): 278-281.
[2] Yeh C L, Chen Y C. Zero Anaphora Resolution in Chinese with Shallow Parsing[J]. Journal of Chinese Language and Computing, 2007, 17(1): 41-56.
[3] 杨国庆, 孔芳, 朱巧明, 等. 基于规则的中文缺省识别研究[J]. 计算机科学, 2011, 38(12): 255-257.
[4] Qin K, Kong F, Li P, et al. Chinese zero anaphor detection: rule-based approach[M].Knowledge Engineering and Management. Springer Berlin Heidelberg, 2011: 403-407.
[5] Zhao S, Ng H T. Identification and Resolution of Chinese Zero Pronouns: A Machine Learning Approach[C]//Proceedings of the EMNLP-CoNLL. 2007, 2007: 541-550.
[6] Kong F, Zhou G. A tree kernel-based unified framework for Chinese zero anaphora resolution[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2010: 882-891.
[7] SongYang, Wang Houfeng. Chinese Zero Anaphora Resolution with Markov Logic[J]. Journal of Computer Research and Development, 2015, 52(9): 2114-2122.
[8] 秦凯伟, 孔芳, 李培峰, 等. 用于中文缺省识别研究的机器学习方法[J]. Computer Engineering, 2012, 38(22): 130-132.
[9] 刘慧慧, 王素格, 赵策力. 观点句中评价对象/属性的缺省项识别方法研究[J]. 中文信息学报, 2014, 28(6): 175-182.
[10] Yang Y,Xue N. Chasing the ghost: recovering empty categories in the Chinese Treebank[C]//Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Association for Computational Linguistics, 2010: 1382-1390.
[11] Rao S,Ettinger A, Hal Daumé I I I, et al. Dialogue focus tracking for zero pronoun resolution[C]//Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). 2015: 494-502.
[12] Chen C, Ng V. Chinese Zero Pronoun Resolution: A Joint Unsupervised Discourse-Aware Model Rivaling State-of-the-Art Resolvers[C]//Proceedings of the Meeting of the Association for Computational Linguistics, 2015.
[13] Nianwen X, Xia F. The bracketing Guidelines for the Penn Chinese Treebank Project[R].Technical Reqort IRCS 00-08,University of Pennsylvania, 2000.
[14] Yeh C L, Chen Y J. An Empirical Study of Zero Anaphora Resolution in Chinese Based on Centering Model[C]//Proceedings of the ROCLING. 2001.
[15] Lafferty J,Mccallum A, Pereira F, et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[C]//Proceedings of the International Conference on Machine Learning, 2001.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61462073)
{{custom_fund}}