该文将古文对联规则区分为硬规则与软规则,将软规则区分为字相对与上下文相对。并在软规则指导下建立对联应对的有向概率图模型,使用EM(Expectation-Maximization)算法估计模型参数,在求解的搜索过程中加入硬规则,从而给出了一种完整的对联自动应对方法。实验结果表明参数学习后的候选字列表由于一定程度上不考虑上下文相对的影响,比仅用频次统计的候选字列表更为合理。该方法还能够对训练语料库中工整与不工整的对联区分学习。基于该方法所实现的古文对联应对程序达到了一定水平。
Abstract
This paper presents an approach to computer generation of Chinese couplets. After dividing the composition of Chinese couplets into hard rules and soft rules, this paper further points out the soft rules consists of character correspondence and context correspondence. A probabilistic graphical model is proposed for couplet generation based on the soft rules, with parameters estimated by EM (Expectation-Maximization) algorithm. The decoding of the model integrates hard rules as heuristics. The experiment result demonstrates that the candidate characters produced by this model are better than those produced simply by frequency. The model can even learn parameters from the data set containing some couplets with poor quality. The couplet generation program implemented by this approach bears an acceptable performance.
关键词
计算机应用 /
中文信息处理 /
对联应对 /
最大熵马尔可夫模型
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
Chinese couplet generation /
maximum entropy Markov model
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Ming Zhou,Heung-yeung Shum. Generating Chinese language couplets[P]. US 2007/0005345 A1,2007.
[2] Long Jian, Ming Zhou.Generating Chinese Couplets using a Statistical MT Approach[C]//Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), 2008: 377-384.
[3] 易勇,何中市,李良炎,等. 基于语言模型的联语应对研究[J]. 计算机科学,2006,33(4): 162-163,173.
[4] J Lafferty,A McCallum,F Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data[C]//Proc. 18th International Conf. on Machine Learning,2001.
[5] A McCallum,D Freitag,F Pereira. Maximum entropy Markov models for information extraction and segmentation[C]//Proc. 17th International Conf. on Machine Learning,2000.
[6] JA Bilmes. A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models[R]. International Computer Science Institute TR-97-021,1998.
[7] 王力. 王力近体诗格律学[M]. 山西古籍出版社, 2003.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家863计划资助项目(2007AA01Z148);国家自然科学基金资助项目(60321002)
{{custom_fund}}