汉英机器翻译源语分析中词的识别

傅爱平

PDF(186 KB)
PDF(186 KB)
中文信息学报 ›› 1999, Vol. 13 ›› Issue (5) : 8-14.
综述

汉英机器翻译源语分析中词的识别

  • 傅爱平
作者信息 +

Chinese Sentence Tokenization in a Chinese-English MT System

  • Fu Aiping
Author information +
History +

摘要

汉英MT源语分析首先遇到的问题是词的识别。汉语中的“词”没有明确的定义,语素和词、词和词组、词组和句子,相互之间也没有清楚的界限。按照先分词、再句法分析的办法,会在分词时遇到构词问题和句法问题相互交错的困难。作者认为,可以把字作为源语句法分析的起始点,使词和词组的识别与句法分析同时进行。本文叙述了这种观点及其实现过程,并且以处理离合词为例,说明了识别的基本方法。

Abstract

The first problem we have met in source language analysis in a Chinese-English MT system is Chinese sentence tokenization , as in written Chinese there is no explicit word delimiter. Finding token boundaries for a character string will be often interlaced with syntactic parsing , or even with semantic relations. This paper presents an approach of combination of sentence tokenization and syntactic-semantic analysis. Instead of getting tokenized word string before sentence parsing , the tokenizing component is built into the parser , i. e. syntactic and semantic information could be used for recognizing words when necessary during parsing which is supported by a dictionary with descriptions for individual usage and a set of common rules.

关键词

机器翻译 / 汉语自动分析 / 汉语词的自动识别

Key words

Machine translation / Chinese language Parsing / Chinese tokenization

引用本文

导出引用
傅爱平. 汉英机器翻译源语分析中词的识别. 中文信息学报. 1999, 13(5): 8-14
Fu Aiping. Chinese Sentence Tokenization in a Chinese-English MT System. Journal of Chinese Information Processing. 1999, 13(5): 8-14

参考文献

[1] 董振东. 汉语分词研究漫谈. 语言文字应用,1997 ,1
[2] 黄昌宁. 中文信息处理中的分词问题,语言文字应用,1997 ,1
[3] 刘群,俞士汶. 汉英机器翻译的难点分析,见:1998中文信息处理国际会议论文集,北京:清华大学出版社,1998
[4] 刘倬. 中文信息处理中的切词和句法分析. 中国语文,1985 ,3
[5] 吕叔湘. 汉语语法分析问题. 北京:商务印书馆,1979
[6] 陆志韦. 汉语的构词法. 北京:科学出版社,1957
[7] 王洪君. 从字和字组看词和短语. 中国语文,1994 ,2
[8] 沈阳. 现代汉语复合词的动态类型. 语言教学与研究,1997 ,2
[9] Wu Andi.Word Segmentation in Sentence Analysis , Proceedings of 1998 International Conference on Chinese Information Processing ,1998
[10] 徐通锵. 语言论-语义型语言的结构原理和研究方法. 沈阳:东北师范大学出版社,1997
PDF(186 KB)

700

Accesses

0

Citation

Detail

段落导航
相关文章

/