Journal of Chinese Information Processing

Select

The Improvement of Automatic Machine Translation Evaluation

ZHANG Jian,WU Ji,ZHOU Ming

2003, 17(6): 2-9.

Abstract ( ) PDF ( )

Knowledge map

Save

Evaluation plays a critical role in the machine translation. The research of automatic machine translation evaluation is an urgent need for the natural language processing researchers and developers. This paper briefly describes the background of evaluation of machine translation and two important technology of automatic evaluation : BLEU and NIST metrics. Then , we presents some improvements for these metrics by the ideas from text retrieval , which is called TFIDF-weighted metric. This method avoids the shortcoming of BLEU metric and achieves a higher F-ratio value. As a result , it can give a remarkable effect on the automatic evaluation of machine translation. We also describe an evaluation platform which can take more convenience to the researches and developers.

Select

Structure Analysis and Extraction for the Definitions of Chinese Terms

ZHANG Yan,ZONG Cheng-qing,XU Bo

2003, 17(6): 10-17.

Abstract ( ) PDF ( )

Knowledge map

Save

The work presented in this paper is a kind of application based on Chinese syntactic parsing. It is theoretic discussion to define term names. The definition of terms provides patterns and structures for term concepts and is the data basis of knowledge discovery. The structures of term definitions also can be a grammar knowledge system in special domain. In this paper , the corpora of electronics and computer domain are firstly segmented and tagged with part-of-speech. Then two parsers are applied to obtain structures and phrases of sentences. According to the syntactic structures of Chinese sentences , we have summarized the structure characteristics of term definitions and automatically extracted the patterns of definitions. Finally , we describe the algorithm to define a new term according to the built data knowledge.

Select

Generating Japanese from the Case Relation Representation of Chinese

DAI Xin-yu,CHEN Jia-jun,WANG Qi-xiang

2003, 17(6): 18-25.

Abstract ( ) PDF ( )

Knowledge map

Save

This paper presents a Japanese generation sub-system , which is used in a transfer-based Chinese-Japanese machine translation system. The Chinese parsing tree is introduced first . It is a dependency tree based on the case grammar. Syntactic , semantic and case information are combined into the nodes on the tree. Then , according to the characters of Japanese , we discuss some difficult issues in the process of Japanese generation , such as Japanese word selection , word inflection and accompany particles generation. The architecture of the rule-based Japanese generation system is presented and the rule system for generation is described in detail. At last , some rule examples and translation examples are given. And we discuss the future work for this translation system.

Select

Study on Intelligent Retrieval of Event Relevant Documents Based on Event Frame

WU Ping-bo,CHEN Qun-xiu,MA Liang

2003, 17(6): 26-31,60.

Abstract ( ) PDF ( )

Knowledge map

Save

It is differentiation and transference of the event topic and interference from other similar event that restrict ability of retrieval system in retrieval of event relevant documents. The paper presents a retrieval method based on event frame knowledge and event body information. In the method the evaluation function on event relevancy is modified. Firstly frame knowledge is gathered from event corpus , and event body information is collected from event documents ; then those knowledge and information are converted into vectors ; and finally relevant evaluation function of retrieval system is modified with the vectors. The result of experiments indicates this method is feasible and advanced on retrieving relevant documents of event .

Select

Automatic Paraphrasing of Chinese Utterances

ZHANG Yu-jie,Kazuhide Yamamoto

2003, 17(6): 32-39.

Abstract ( ) PDF ( )

Knowledge map

Save

One of the key issues in spoken language translation is how to deal with unrestricted expressions in spontaneous utterances. This research is centered on the development of a Chinese paraphraser that automatically paraphrases utterances prior to transfer in Chinese-Japanese spoken language translation. In this paper , a pattern-matching approach to paraphrasing is proposed for which only morphological analysis is required. In addition , a pattern construction method is described through which paraphrasing patterns can be efficiently learned from a paraphrase corpus and human experience. Using the implemented paraphraser and the obtained patterns , a paraphrasing experiment was conducted and the results were evaluated.

Select

Multi-Layer Structure MLLR Adaptation Algorithm Based on Target-Driven

MU Xiang-yu,JIA Lei,ZHANG Shu-wu,XU Bo

2003, 17(6): 40-47.

Abstract ( ) PDF ( )

Knowledge map

Save

In this paper , a new algorithm called Target-Driven based multiple-layer maximum likelihood linear regression (TMLLR) is proposed for model adaptation in speech recognition. The algorithm can be regarded as the improvement of maximum likelihood linear regression (MLLR) using the generation of regression class trees for model adaptation. Different from conventional MLLR , the regression classes of TMLLR are generated dynamically based on increment of target function and a multi-layer feedback mechanism. Because of the special multi-layer structure of TMLLR , some redundant computing cost can be reduced , which caused much faster adaptation speed. The target-driven strategy is aimed at increasing the likelihood probability , which is same to measure of speech recognition , so a higher recognition accuracy of the system can be achieved. In comparison with the conventional MLLR using the generation of regression class tree , TMLLR achieved a further word error rate reduction by 10% and had only about half computational time consuming in supervised adaptation experiments.

Select

Multi-Font Printed Tibetan Character Recognition

WANG Hua,DING Xiao-qing

2003, 17(6): 48-53.

Abstract ( ) PDF ( )

Knowledge map

Save

Tibetan character recognition is a significant module of Chinese multi-language information processing system ,however hardly any research work has been undertaken yet . A comprehensive method based on statistical pattern recognition approach for multi-font printed Tibetan character recognition is proposed. Firstly , directional line element features are extracted from the contour of input character. After feature dimension reduction by Linear Dircriminant Analysis (LDA) to formulate compact feature vector , two-stage classification strategy based on confidence value is adapted to decide the category of input character. Euclidean Distance with Deviation (EDD) is designed for effective rough classification while Modified Quadratic Discriminant Function (MQDF) is employed to perform fine classification. Selecting proper classifier parameters via experiment , a recognition accuracy of 99.79% on test set containing 177,600 characters (300 samples per category) is achieved. The experimental results show the validity of proposed method.

Select

Research on Intelligence-based Tutoring System

GAO Guang-lai,WANG Yu-feng

2003, 17(6): 54-60.

Abstract ( ) PDF ( )

Knowledge map

Save

Tutoring system is very important to Web-based education. The current tutoring systems only matches the keywords of the questions in the question-database according to users' inputs , so the query precision and user interface can't meet the user's needs. To solve this problem , this paper presents an agent-based intelligent tutoring system by applying the semantic-net principle. The necessity of establishing an intelligent tutoring system is discussed and the tutoring model and its technical route based on the constrained fields are also given. The functions of this system are realized by using two college computer courses as the source of knowledge base. The experiment results indicate that the method proposed by this paper improves the query precision effectively and the user interface is friendly and convenient .

Select

On Some Issues of the Establishment of Ancient Chinese Font

ZHANG Zai-xing

2003, 17(6): 61-66.

Abstract ( ) PDF ( )

Knowledge map

Save

With the development of the study on computerization of ancient writings, the establishment of standard ancient Chinese font should be paid immediate attention. This paper explained four issues noteworthy in the establishment of ancient Chinese font. Building a complete collection of glyphs by establishing database of ancient Chinese characters and sorting glyphs thoroughly, to ensure the authenticity of glyphs by scanning rubbings. Considering complicated relationships such as different usages , different interpretations , variant forms concerning glyphs and characters. The categorization of glyphs must follow the principles of standardization and differentiation. The classification of characters must follow the principles of the character frequency and the glyph frequency.

Please choose a citation manager

Content to export

2003 Volume 17 Issue 6 Published: 15 December 2003