现代汉语存在着许多歧义短语结构,仅依靠句中词性标记无法获得词与词之间正确的搭配关系。本文研究了大量包含歧义的短语实例,分析了计算机处理汉语结构时面临的定界歧义和结构关系歧义问题,在已有短语结构规则的基础上归纳出了七种结构歧义模式,提出了分析歧义模式的关键是四种基本搭配信息的判断,并实现了基于语义知识和搭配知识的消歧算法。对887处短语进行排歧的实验结果表明,处理短语结构的正确率由82.30%上升到87.18%。
Abstract
There are a variety of phrase ambiguities in Chinese. It is difficult to determine the correct syntactic structure of Chinese sentences with only part-of-speech information. Based on the observation on ambiguous phrases, this paper at first analyzes the problems of determining ambiguous boundaries and ambiguous structural relations of Chinese phrases, points out seven types of phrase ambiguities, then concludes four types of collocation information which are vital for processing ambiguous phrases. A disambiguation algorithm using both semantic and collocation knowledge is proposed consequently. The experimental result on 887 ambiguous phrases shows that this algorithm increases the disambiguation accuracy from 82.3% to 87.18%.
关键词
计算机应用 /
中文信息处理 /
现代汉语语义知识库 /
搭配词典 /
短语歧义排歧
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
Chinese semantic knowledge base /
collocation dictionary /
disambiguation of ambiguous phrases
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 邵敬敏. 歧义分化方法探讨[A]. 邵敬敏. 九十年代的语法思考[C]. 北京: 北京语言学院出版社,1994.
[2] 刘倬,傅爱平. 机器翻译中汉语的形式和语义分析二题[J]. 中文信息学报,1999,13 (5): 2-13.
[3] 詹卫东,常宝宝, 俞士汶. 汉语短语结构定界歧义类型分析及分布统计[J] . 中文信息学报,1999,13(3): 9-17.
[4] 冯志伟. 论歧义结构的潜在性[J] . 中文信息学报,1995,,9 (4): 14-32.
[5] 苑春法,黄锦辉,李文捷. 基于语义知识的汉语句法结构排歧[J] . 中文信息学报,1999,13 (1): 1-8.
[6] 冯志伟. 歧义消解策略初探[A] . 陈力为,袁琦. 计算语言学进展与应用[C] . 北京:清华大学出版社,1995.
[7] 刘颖. 规则与统计结合进行汉英机器翻译消歧[J]. 计算机应用,2002,22 (5): 21-23.
[8] 朱德熙. 汉语句法中的歧义现象[J]. 中国语文,1980,2.
[9] 尹一瓴,陈群秀. 现代汉语语义知识库用于句法分析的研究[J]. 计算机应用,2004,s2: 266-269.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家863高科技项目(2001AA114210)
{{custom_fund}}