2003 Volume 17 Issue 6 Published: 15 December 2003
  

  • Select all
    |
  • ZHANG Jian,WU Ji,ZHOU Ming
    2003, 17(6): 2-9.
    Abstract ( ) PDF ( ) Knowledge map Save
    Evaluation plays a critical role in the machine translation. The research of automatic machine translation evaluation is an urgent need for the natural language processing researchers and developers. This paper briefly describes the background of evaluation of machine translation and two important technology of automatic evaluation : BLEU and NIST metrics. Then , we presents some improvements for these metrics by the ideas from text retrieval , which is called TFIDF-weighted metric. This method avoids the shortcoming of BLEU metric and achieves a higher F-ratio value. As a result , it can give a remarkable effect on the automatic evaluation of machine translation. We also describe an evaluation platform which can take more convenience to the researches and developers.
  • ZHANG Yan,ZONG Cheng-qing,XU Bo
    2003, 17(6): 10-17.
    Abstract ( ) PDF ( ) Knowledge map Save
    The work presented in this paper is a kind of application based on Chinese syntactic parsing. It is theoretic discussion to define term names. The definition of terms provides patterns and structures for term concepts and is the data basis of knowledge discovery. The structures of term definitions also can be a grammar knowledge system in special domain. In this paper , the corpora of electronics and computer domain are firstly segmented and tagged with part-of-speech. Then two parsers are applied to obtain structures and phrases of sentences. According to the syntactic structures of Chinese sentences , we have summarized the structure characteristics of term definitions and automatically extracted the patterns of definitions. Finally , we describe the algorithm to define a new term according to the built data knowledge.
  • DAI Xin-yu,CHEN Jia-jun,WANG Qi-xiang
    2003, 17(6): 18-25.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper presents a Japanese generation sub-system , which is used in a transfer-based Chinese-Japanese machine translation system. The Chinese parsing tree is introduced first . It is a dependency tree based on the case grammar. Syntactic , semantic and case information are combined into the nodes on the tree. Then , according to the characters of Japanese , we discuss some difficult issues in the process of Japanese generation , such as Japanese word selection , word inflection and accompany particles generation. The architecture of the rule-based Japanese generation system is presented and the rule system for generation is described in detail. At last , some rule examples and translation examples are given. And we discuss the future work for this translation system.
  • WU Ping-bo,CHEN Qun-xiu,MA Liang
    2003, 17(6): 26-31,60.
    Abstract ( ) PDF ( ) Knowledge map Save
    It is differentiation and transference of the event topic and interference from other similar event that restrict ability of retrieval system in retrieval of event relevant documents. The paper presents a retrieval method based on event frame knowledge and event body information. In the method the evaluation function on event relevancy is modified. Firstly frame knowledge is gathered from event corpus , and event body information is collected from event documents ; then those knowledge and information are converted into vectors ; and finally relevant evaluation function of retrieval system is modified with the vectors. The result of experiments indicates this method is feasible and advanced on retrieving relevant documents of event .
  • ZHANG Yu-jie,Kazuhide Yamamoto
    2003, 17(6): 32-39.
    Abstract ( ) PDF ( ) Knowledge map Save
    One of the key issues in spoken language translation is how to deal with unrestricted expressions in spontaneous utterances. This research is centered on the development of a Chinese paraphraser that automatically paraphrases utterances prior to transfer in Chinese-Japanese spoken language translation. In this paper , a pattern-matching approach to paraphrasing is proposed for which only morphological analysis is required. In addition , a pattern construction method is described through which paraphrasing patterns can be efficiently learned from a paraphrase corpus and human experience. Using the implemented paraphraser and the obtained patterns , a paraphrasing experiment was conducted and the results were evaluated.
  • MU Xiang-yu,JIA Lei,ZHANG Shu-wu,XU Bo
    2003, 17(6): 40-47.
    Abstract ( ) PDF ( ) Knowledge map Save
    In this paper , a new algorithm called Target-Driven based multiple-layer maximum likelihood linear regression (TMLLR) is proposed for model adaptation in speech recognition. The algorithm can be regarded as the improvement of maximum likelihood linear regression (MLLR) using the generation of regression class trees for model adaptation. Different from conventional MLLR , the regression classes of TMLLR are generated dynamically based on increment of target function and a multi-layer feedback mechanism. Because of the special multi-layer structure of TMLLR , some redundant computing cost can be reduced , which caused much faster adaptation speed. The target-driven strategy is aimed at increasing the likelihood probability , which is same to measure of speech recognition , so a higher recognition accuracy of the system can be achieved. In comparison with the conventional MLLR using the generation of regression class tree , TMLLR achieved a further word error rate reduction by 10% and had only about half computational time consuming in supervised adaptation experiments.
  • WANG Hua,DING Xiao-qing
    2003, 17(6): 48-53.
    Abstract ( ) PDF ( ) Knowledge map Save
    Tibetan character recognition is a significant module of Chinese multi-language information processing system ,however hardly any research work has been undertaken yet . A comprehensive method based on statistical pattern recognition approach for multi-font printed Tibetan character recognition is proposed. Firstly , directional line element features are extracted from the contour of input character. After feature dimension reduction by Linear Dircriminant Analysis (LDA) to formulate compact feature vector , two-stage classification strategy based on confidence value is adapted to decide the category of input character. Euclidean Distance with Deviation (EDD) is designed for effective rough classification while Modified Quadratic Discriminant Function (MQDF) is employed to perform fine classification. Selecting proper classifier parameters via experiment , a recognition accuracy of 99.79% on test set containing 177,600 characters (300 samples per category) is achieved. The experimental results show the validity of proposed method.
  • GAO Guang-lai,WANG Yu-feng
    2003, 17(6): 54-60.
    Abstract ( ) PDF ( ) Knowledge map Save
    Tutoring system is very important to Web-based education. The current tutoring systems only matches the keywords of the questions in the question-database according to users' inputs , so the query precision and user interface can't meet the user's needs. To solve this problem , this paper presents an agent-based intelligent tutoring system by applying the semantic-net principle. The necessity of establishing an intelligent tutoring system is discussed and the tutoring model and its technical route based on the constrained fields are also given. The functions of this system are realized by using two college computer courses as the source of knowledge base. The experiment results indicate that the method proposed by this paper improves the query precision effectively and the user interface is friendly and convenient .
  • ZHANG Zai-xing
    2003, 17(6): 61-66.
    Abstract ( ) PDF ( ) Knowledge map Save
    With the development of the study on computerization of ancient writings, the establishment of standard ancient Chinese font should be paid immediate attention. This paper explained four issues noteworthy in the establishment of ancient Chinese font. Building a complete collection of glyphs by establishing database of ancient Chinese characters and sorting glyphs thoroughly, to ensure the authenticity of glyphs by scanning rubbings. Considering complicated relationships such as different usages , different interpretations , variant forms concerning glyphs and characters. The categorization of glyphs must follow the principles of standardization and differentiation. The classification of characters must follow the principles of the character frequency and the glyph frequency.