邱立坤,史林林,王厚峰. 多领域中文依存树库构建与影响统计句法分析因素之分析[J]. 中文信息学报, 2015, 29(5): 69-76.
QIU Likun, SHI Linlin, WANG Houfeng. Construction of Multi-Domain Chinese Dependency Treebanks and A Study on Factors Influencing the Statistical Parsing. , 2015, 29(5): 69-76.
Construction of Multi-Domain Chinese Dependency Treebanks and A Study on Factors Influencing the Statistical Parsing
QIU Likun1, SHI Linlin1, WANG Houfeng2
1. School of Chinese Language and Literature, Ludong University, Yantai, Shandong 264025, China; 2. Institute of Computational Linguistics, Peking University, Beijing 100871, China
Abstract:To boost Chinese dependency parsing and analyze factors influencing Chinese dependency parsing, we constructe a large-scale general treebank and several middle-scale treebanks for specific domains. Then, we performe experiments to evaluate the parsing accuracy influenced by the quality, the scale and the domain difference of the dependency treenbank. The results show that both the treebank quality and its scale are positively related to parsing accuracy, and the quality is more influential. The experiments also demonstrate that general treebanks and domain treebanks are complementary, and, whether a general treebank and domain treebank should be used together is dependent on the difference between them.
[1] Ryan McDonald, Fernando Pereira, Kiril Ribarov, et al. Non-projective dependency parsing using spanning tree algorithms[C]//Proceedings of HLT-EMNLP, 2005: 523-530. [2] Joakim Nivre. Inductive dependency parsing[M]. Springer.2006. [3] Slav Petrov, Ryan McDonald. Overview of the 2012 Shared Task on Parsing the Web[C]//Notes of the First Workshop on Syntactic Analysis of Non-Canonical Language, 2012. [4] 李正华,车万翔,刘挺.短语结构树库向依存树库转化研究[J].中文信息学报, 2008,22(6): 14-19. [5] Zhenhua Li, Ting Liu, Wanxiang Che. Exploiting multiple treebanks for parsing with quasisynchronous grammars[C]//Proceedings of ACL, 2012: 675-684. [6] Kenji Sagae, Yusuke Miyao, Rune Stre, et al. Evaluating the Effects of Treebank Size in a Practical Application for Parsing[C]//Proceedings of ACL 2008 Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing, 2008: 14-20. [7] Meishan Zhang, Yue Zhang, Wanxiang Che, et al. Type-Supervised Domain Adaptation for Joint Segmentation and POS-Tagging[C]//Proceedings of EACL, 2014: 588-597. [8] Wanxiang Che, Zhenghua Li, Ting Liu. Chinese Dependency Treebank 1.0 LDC2012T05[DB]. Web Download. Philadelphia: Linguistic Data Consortium, 2012. [9] Likun Qiu, Yue Zhang, Peng Jin, et al. Multi-view Chinese treebanking[C]//Proceedings of COLING, 2014: 257-268. [10] Pi-Chuan Chang, Huihsin Tseng, Dan Jurafsky, et al. Discriminative reordering with Chinese grammatical relations features[C]//Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation, 2009: 51-59. [11] 刘海涛. 基于依存树库的汉语句法计量研究[J]. 长江学术, 2008, 3:120-128. [12] Wenliang Chen, Jun'ichi Kazama, Kiyotaka Uchimoto, et al. Improving Dependency Parsing with Subtrees from Auto-Parsed Data[C]//Proceedings of EMNLP, 2009, 2: 570-579. [13] Bernd Bohnet. Top accuracy and fast dependency parsing is not a contradiction[C]//Proceedings of Coling, 2010: 89-97. [14] Yue Zhang, Stephen Clark. Syntactic Processing Using the Generalized Perceptron and Beam Search[J]. Computational Linguistics, 2011, 37(1): 105-151. [15] Wanxiang Che, Valentin Spitkovsky, Ting Liu. A comparison of Chinese parsers for Stanford dependencies[C]//Proceedings of EACL, 2012: 11-16. [16] Nianwen Xue, Fei Xia, Fu-Dong Chiou, et al. The Penn Chinese Treebank: Phrase Structure Annotation of a Large Corpus[J]. Natural Language Engineering, 2005, 11(2): 207-238. [17] 陈凤仪,蔡碧芳,陈克健,等. 中文句结构树资料库 (Sinica Treebank)的构建[J]. Computational Linguistics and Chinese Language Processing, 1999, 4(2): 87-104. [18] 周强.2004.汉语句法树库标注体系[J].中文信息学报, 2004, 18(4): 1-8. [19] 靳光瑾,肖航,富丽,等.现代汉语语料库建设及深加工[J].语言文字应用, 2005, 2: 111-120. [20] 詹卫东.树库在汉语语法辅助教学中的应用初探[J]. Journal of Technology and Chinese Language Teaching, 2012, 3(2): 16-29. [21] Nianwen Xue, Xiuhong Zhang, Zixin Jiang, et al. Chinese Treebank 8.0 LDC2013T21[DB]. Web Download. Philadelphia: Linguistic Data Consortium. 2013.