Review
LI Wen 1,2, LI Miao1, LIANG Qing3, ZHU Hai1,2, YING Yulong1,2, Wudabala1
2011, 25(4): 122-129.
This paper presents a Mongolian morphological segmentation approach by statistical machine translation method and minimum constituent-context cost model. The phrase based statistical machine translation and minimum constituent-context cost model are adopted to deal with in-vocabulary and out-of-vocabulary morphological segmentation, respectively. Three features commonly used in phrase based statistical machine translation were selected for the segmentation, i.e. the phrase translation probability, the lexical translation probability and the language model score. The uni-gram morpheme context and N-gram suffix context are considered in the minimum constituent-context cost model. Experiments show that the precision of the morphological segmentation system achieves 96.94%, and the translation results of the statistical machine translation system is improved obviously.
Key wordsmorphology; morphological segmentation; machine translation; statistical model