Morphological Syntactic and Semantic Analysis/Application
GUO Zhen, ZHANG Yujie, SU Chen, XU Jinan
2014, 28(6): 1-8.
Recent work on joint word segmentation, POS tagging, and dependency parsing in Chinese has two key problems: one is that the word segmentation based on character and the dependency parsing based on word are not well-combined in the transition-based framework; the other is that the current joint model suffers from the insufficiency of annotated corpus. In order to resolve the first problem, we propose to transform the conventional word-based dependency tree into character-based dependency tree by using the internal structure of words and then propose a novel character-level joint model for the three tasks. For Chinese word segmentation, we design 4 transition actions: Shfit_S, Shift_B, Shift_M and Shift_E, through which the features used in previous researches can also be integrated into the model. In order to resolve the second problem, we propose a novel semi-supervised joint model for exploiting n-gram feature and dependency subtree feature from partially-annotated corpus. Experimental results on the Chinese Treebank show that our joint model achieved the F1-scores of 98.31%, 94.84% and 81.71% for Chinese word segmentation, POS tagging, and dependency parsing, respectively. Our model outperforms the pipeline model in the three tasks by 0.92%, 1.77% and 3.95%, respectively. Especially, the F1 value of word segmentation and POS tagging achieved the best among the public results so far.