The Default Comment Object Identification Based on Condition Random Fields
TANG Wenwu1; GUO Yi1;2; XU Yongbin1; FANG Xu1
1. Department of Computer Science and Engineering, East China University of
Science and Technology, Shanghai 200237, China;
2. School of Information Science and Technology, Shihezi University, Shihezi, Xinjiang 832003, China
Abstract:The identification of the default objects and attributes in a comment is important in sentiment analysis for the commerce website’s reviews. To resolve the default comment objects and attributes, this paper proposes an effective identification method based on Conditional Random Fields (CRF). After applying an emotion dictionary to locate the opinion comments, we treat this task as a sequence labeling problem, and choose the lexical and dependency parsing elements as features. The evaluation results prove the proposed method with reasonable good accuracy and recall rates.
[1] 秦凯伟, 孔芳, 李培峰, 等. 基于规则的中文零指代项识别研究[J]. 计算机科学, 2012, 39(10): 278-281.
[2] Yeh C L, Chen Y C. Zero Anaphora Resolution in Chinese with Shallow Parsing[J]. Journal of Chinese Language and Computing, 2007, 17(1): 41-56.
[3] 杨国庆, 孔芳, 朱巧明, 等. 基于规则的中文缺省识别研究[J]. 计算机科学, 2011, 38(12): 255-257.
[4] Qin K, Kong F, Li P, et al. Chinese zero anaphor detection: rule-based approach[M].Knowledge Engineering and Management. Springer Berlin Heidelberg, 2011: 403-407.
[5] Zhao S, Ng H T. Identification and Resolution of Chinese Zero Pronouns: A Machine Learning Approach[C]//Proceedings of the EMNLP-CoNLL. 2007, 2007: 541-550.
[6] Kong F, Zhou G. A tree kernel-based unified framework for Chinese zero anaphora resolution[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2010: 882-891.
[7] SongYang, Wang Houfeng. Chinese Zero Anaphora Resolution with Markov Logic[J]. Journal of Computer Research and Development, 2015, 52(9): 2114-2122.
[8] 秦凯伟, 孔芳, 李培峰, 等. 用于中文缺省识别研究的机器学习方法[J]. Computer Engineering, 2012, 38(22): 130-132.
[9] 刘慧慧, 王素格, 赵策力. 观点句中评价对象/属性的缺省项识别方法研究[J]. 中文信息学报, 2014, 28(6): 175-182.
[10] Yang Y,Xue N. Chasing the ghost: recovering empty categories in the Chinese Treebank[C]//Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Association for Computational Linguistics, 2010: 1382-1390.
[11] Rao S,Ettinger A, Hal Daumé I I I, et al. Dialogue focus tracking for zero pronoun resolution[C]//Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). 2015: 494-502.
[12] Chen C, Ng V. Chinese Zero Pronoun Resolution: A Joint Unsupervised Discourse-Aware Model Rivaling State-of-the-Art Resolvers[C]//Proceedings of the Meeting of the Association for Computational Linguistics, 2015.
[13] Nianwen X, Xia F. The bracketing Guidelines for the Penn Chinese Treebank Project[R].Technical Reqort IRCS 00-08,University of Pennsylvania, 2000.
[14] Yeh C L, Chen Y J. An Empirical Study of Zero Anaphora Resolution in Chinese Based on Centering Model[C]//Proceedings of the ROCLING. 2001.
[15] Lafferty J,Mccallum A, Pereira F, et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[C]//Proceedings of the International Conference on Machine Learning, 2001.