Abstract:In recent years, the Chinese SRL (semantic role labeling) has aroused the intensive attention. Many SRL systems have been built on the parsing trees, in which the constituents of the sentence structure are identified and then classified. In contrast, this paper establishes a semantic chunking based method which changes the SRL task from the traditional “parsing-semantic role identification-semantic role classification” process into a simple “semantic chunk identification-semantic chunk classification” pipeline. The semantic chunking, which is named after the syntactic chunking, is used to identify the semantic chunk, namely the arguments of the verbs. Based on the semantic chunking result, the Chinese SRL can be changed into a sequence labeling problem instead of the classification problem. We apply the conditional random fields to the problem and get better performance. Along with the removal of the parsing stage, the SRL task avoids the dependence on parsing, which is always the bottleneck both of speed and precision. The experiments have shown that the outperforms of our approach previously best-reported methods on Chinese SRL with an impressive time reduction. We also show that the proposed method works much better on gold word segmentation and POS tagging than on the automatic results. Key words computer application; Chinese information processing; semantic role labeling; semantic chunking; conditional random fields; sequence labeling
[1] S. Narayanan and S. Harabagiu. Question answering based on semantic structures[C]//Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland. 2004. [2] M. Surdeanu, S. Harabagiu, J. Williams, and P. Aarseth. Using predicate-argument structures for information extraction [C]//Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Ann Arbor, Michigan. 2003. [3] H. C. Boas. Bilingual FrameNet dictionaries for machine translation [C]//Proceedings of LREC 2002, Las Palmas, Spain. 2002. [4] D. Gildea, D. Jurafsky. Automatic labeling of semantic roles[J]. Computational Linguistics, 2002,28(3):245-288. [5] F.C. Baker, C.J. Fillmore, and J.B. Lowe. The Berkeley FrameNet project[C]//Proceedings of the 17th international conference on Computational linguistics, Montreal, Canada. 1998: 86-90. [6] P. Kingsbury and M. Palmer. From TreeBank to PropBank[C]//Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC-2002), Las Palmas, Spain. 2002. [7] Carreras X, M rques L. Introduction to the conll-2004 shared task: Semantic role labeling[C]//Proceedings of CoNLL-2004,Boston, MA, USA, 2004: 89-97. [8] Carreras X, M rques L. Introduction to the conll-2005 shared task: Semantic role labeling[C]//Proceedings of CoNLL-2005, 2005. [9] A. Moschitti. A Study on Convolution Kernels for Shallow Statistic Parsing[C]//Proceedings of the 42nd Meeting of the Association for Computational Linguistics, Barcelona, Spain, 2004: 335-342. [10] S. Pradhan, K. Hacioglu, V. Krugler, W. Ward, J.H. Martin, D. Jurafsky. Support vector learning for semantic argument classification[J]. Machine Learning Journal, 2005,60(1-3),11-39. [11] M. Zhang, W. Che, A.T. AW, C.L. Tan, G. Zhou, T. Liu, S. Li, A Grammar-driven Convolution Tree Kernel for Semantic Role Classification[C]//Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL ’07), Prague, Czech Republic, 2007. [12] H. Sun, D. Jurafsky. Shallow Semantic Parsing of Chinese[C]//Proceedings of the HLT/NAACL, 2004. [13] N. Xue, M. Palmer. Annotating the Propositions in the Penn Chinese Treebank[C]//Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing, Sapporo, Japan. 2003. [14] N. Xue, M. Palmer. Automatic semantic role labeling for Chinese verbs[C]//19th International Joint Conference on Artificial Intelligence. Edinburgh, Scotland. 2005: 1160-1165. [15] N. Xue. Semantic Role Labeling of Chinese Predicates [J]. Computational Linguistics, 2008, 34(2):225-255. [16] 刘挺,车万翔,李生. 基于最大熵分类器的语义角色标注 [J]. 软件学报,2007, 18(3): 565-573. [17] 于江德,樊孝忠,庞文博,余正涛. 基于条件随机场的语义角色标注 [J]. 东南大学学报,2007,23(3): 361-364. [18] 刘怀军,车万翔,刘挺. 中文语义角色标注的特征工程 [J]. 中文信息学报, 2007,21(1): 79-84. [19] 袁毓林. 语义角色的精细等级及其在信息处理中的应用 [J]. 中文信息学报, 2007,21(4): 10-20. [20] K. Hacioglu and W. Ward. Target word detection and semantic role chunking using support vector machines [C]//Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics. Edmonton, Canada. 2003. [21] L. A. Ramshaw, M. P. Marcus. Text chunking using transformation-based learning [C]//Proceedings of the 3rd Workshop on Very Large Corpora. 1995. [22] E. F. Sang, T. Kim, J. Veenstra. Representing text chunks [C]//Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, China. 1999. [23] K. Uchimoto, Q. Ma, M. Murata, H. Ozaku, and H. Isahara. Named Entity Extraction Based on A Maximum Entropy Model and Transformation Rules [C]//Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, China. 2000. [24] T. Kudo, and Y. Matsumoto. Chunking with Support Vector Machines [C]//Proceedings of Second Meeting of North American Chapter of the Association for Computational Linguistics, Pittsburgh, USA. 2001. [25] Z. P. Jiang, J. Li, H. T. Ng. Semantic Argument Classification Exploiting Argument Interdependence [C]//Proceedings of 19th International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, 2005: 1067-1072. [26] H. T. Ng and J. K. Low. Chinese Part-Of-Speech Tagging: One-At-A-Time Or All-At-Once? Word-Based Or Character-Based?[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Barcelona, Spain.2004. [27] H. Duan, X. Bai, B. Chang, S. Yu. Chinese word segmentation at Peking University[C]//Proceedings of the second SIGHAN workshop on Chinese language processing. Sapporo, Japan, 2003: 152-155. [28] V. Punyakanok, D. Roth, W. Yih. The importance of syntactic parsing and inference in semantic role labeling[J]. Computational Linguistics, 2008, 34(2): 257-287.