段 磊,韩 芳,宋继华. 古汉语双字词自动获取方法的比较与分析[J]. 中文信息学报, 2012, 26(4): 34-43.
DUAN Lei, HAN Fang, SONG Jihua. A Comparative Study on the Automatic Extraction of Two-character Word from Ancient Chinese. , 2012, 26(4): 34-43.
古汉语双字词自动获取方法的比较与分析
段 磊,韩 芳,宋继华
北京师范大学 计算机科学与技术学院,北京 100875
A Comparative Study on the Automatic Extraction of Two-character Word from Ancient Chinese
DUAN Lei, HAN Fang, SONG Jihua
College of Computer Science, Beijing Normal University, Beijing 100875, China
Abstract:Word extraction is of great importance in the research fields of natural language generation, computational lexicography, parsing, corpus linguistic, etc. To address the issue of automatic extraction of two-character word from ancient Chinese, this paper takes the “Records of the Grand Historian” corpus as an example, and uses the statistical methods that based on frequency, mutual information and hypothesis testing to extract two-character word, respectively. Then it compares and analyzes the results according to the manual marked result in detail. It paves the way for the scheme design for the two-character word extraction from ancient Chinese in different applications. Key wordsChinese information processing; Ancient Chinese; Records of the Grand Historian; two-character word; statistical model