基于“词—词性”匹配模式获取的古汉语树库快速构建方法

何 静,宋天宝,彭炜明,朱淑琴,宋继华

PDF(5593 KB)
PDF(5593 KB)
中文信息学报 ›› 2017, Vol. 31 ›› Issue (4) : 114-121.
语言资源建设

基于“词—词性”匹配模式获取的古汉语树库快速构建方法

  • 何 静1,宋天宝1,彭炜明1,朱淑琴1,2,宋继华1
作者信息 +

An Efficient Approach to Ancient Chinese Treebank Construction Based on “Word or POS” Match

  • HE Jing1, SONG Tianbao1, PENG Weiming1, ZHU Shuqin1,2, SONG Jihua1
Author information +
History +

摘要

该文针对古汉语文本小、句简短、模式性强的结构特点,提出了一种基于“词-词性”匹配模式获取的快速树库构建方法,将句法标注过程规约为获取候选匹配模式、制定句法转换规则、自动生成句法树和最终人工校对四个步骤。该方法可大大缩减人工标注工作量,节省树库构建的工程成本,且所获取的匹配规则在古汉语教学研究中具有一定的实用价值。

Abstract

An efficient approach for ancient Chinese treebank construction is proposed, which is based on "word or POS" match strategy. To deal with the ancient Chinese characterized by short-clauses and typical-patterns, it divides the Chinese treebank construction into four steps: 1) candidate match pattern generation; 2) syntactic transformation rule composition; 3) syntactic parsing; 4) manual verification. In addition to minimize the manual annotation cost in treebank construction, the match patterns obtained during this process can provide data support for the ancient Chinese teaching and research.

关键词

古代汉语 / 树库构建 / 模式获取

Key words

ancient Chinese / treebank construction / pattern acquisition

引用本文

导出引用
何 静,宋天宝,彭炜明,朱淑琴,宋继华. 基于“词—词性”匹配模式获取的古汉语树库快速构建方法. 中文信息学报. 2017, 31(4): 114-121
HE Jing, SONG Tianbao, PENG Weiming, ZHU Shuqin, SONG Jihua. An Efficient Approach to Ancient Chinese Treebank Construction Based on “Word or POS” Match. Journal of Chinese Information Processing. 2017, 31(4): 114-121

参考文献

[1]魏培泉, 谭朴森, 刘承慧, 等. 建构一个以共时与历时语言研究为导向的历史语料库[J]. 中文计算语言学期刊, 1997, 2(1): 131-145.
[2]Hu X, Williamson N, Mclaughlin J. Sheffield corpus of Chinese for diachronic linguistic study[J]. Literary and Linguistic Computing, 2005, 20(3): 281-293.
[3]石民, 李斌, 陈小荷. 基于 CRF 的先秦汉语分词标注一体化研究[J]. 中文信息学报, 2010, 24(2): 39-45.
[4]宋继华, 胡佳佳, 孟蓬生, 等. 古今汉语平行语料库的语料构建[J]. 现代教育技术, 2008, 18(1): 92-99.
[5]陈凤仪, 蔡碧芳, 陈克健, 等. 中文句结构树资料库(Sinica Treebank)的构建[J]. Computational Linguistics and Chinese Language Processing, 1999, 4(2): 87-104.
[6]周强. 汉语句法树库标注体系[J]. 中文信息学报, 2004, 18(4): 2-9.
[7]Xue N, Xia F, Chiou F-D, et al. The penn Chinese treebank: phrase structure annotation of a large corpus[J]. Natural language engineering, 2005, 11(02): 207-238.
[8]李正华, 车万翔, 刘挺. 短语结构树库向依存结构树库转化研究[J]. 中文信息学报, 2008, 22(6): 14-19.
[9]邱立坤, 金澎, 王厚峰. 基于依存语法构建多视图汉语树库[J]. 中文信息学报, 2015, 29(3): 9-15.
[10]Levy R, Manning C. Is it harder to parse Chinese, or the Chinese treebank?[C]//Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, 2003: 439-446.
[11]曹海龙. 基于词汇化统计模型的汉语句法分析研究[D].哈尔滨工业大学博士学位论文, 2006.
[12]马金山. 基于统计方法的汉语依存句法分析研究[D]. 哈尔滨工业大学博士学位论文, 2007.
[13]Zhang Y, Clark S. Transition-based parsing of the Chinese treebank using a global discriminative model[C]//Proceedings of the 11th International Conference on Parsing Technologies, 2009: 162-171.
[14]Che W, Spitkovsky V I, Liu T. A comparison of Chinese parsers for Stanford dependencies[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2, 2012: 11-16.
[15]Lee J, Kong Y H. A dependency treebank of classical Chinese poems[C]//Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2012: 191-199.
[16]彭炜明, 何静, 宋继华. 句本位语法图解析句系统的设计与实现[C].第四届数字典藏与数字人文国际研讨会, 2012.

基金

北京师范大学青年教师基金(2014NT39)
PDF(5593 KB)

Accesses

Citation

Detail

段落导航
相关文章

/