2001 Volume 15 Issue 4 Published: 15 August 2001
  

  • Select all
    |
  • LIU Ying
    2001, 15(4): 2-7.
    Abstract ( ) PDF ( ) Knowledge map Save
    Disambiguities of part-of-speech tagging , syntactic and semantic analysis are disambiguted using statistical method. Maximal likelihood principle is used for disambiguting , but it is not all right under all conditions. Robust learning algorithm is used in this paper in order to acquire the right result among all candidates. When score of the right candidate is not maximal , it can be adjusted using robust learning algorithm , thus score of the right candidate is maximal and score of the wrong candidate is reduced. Moreover , there is difference between training set and test set , wrong rate of training set is minimal but wrong rate of test set is not minimal. When there is statistical difference between training set and test set , robust learning algorithm should be used.
  • CUI Zong-jun,TANG Shi-wei,YANG Dong-qing
    2001, 15(4): 8-14.
    Abstract ( ) PDF ( ) Knowledge map Save
    A computational model of ER-model-based restrictive-Chinese query language of relational database is put forward which simulates the langguage process mechanism of human and the process of communicating in natural language is divided into four mutually dependant and interlaced steps : word segmentation , parsing , semantic processing and SQL transformation. A new grammar ,GWERSC(Grammar with ER Semantic Characteristics) is introduced ,which could contribute to syntactic parsing and simplify semantic understanding with the help of its embedded ER model.
  • YU Hai-yan,ZHANG Zhong-yi
    2001, 15(4): 15-20,28.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper discusses the optimization of full text retrieval system based on "indexing of single Chinese character" from three aspects : the compression of inverted index file using Golomb coding method , the bidirectional binary-search intersection algorithm , the technique of parallel computing and double-buffer cache. The experiment shows that these optimizations introduce the less storage spending and higher performance to the system.
  • HAN Ke-song,WANG Yong-cheng,SHEN Zhou,WU Fang-fang
    2001, 15(4): 21-28.
    Abstract ( ) PDF ( ) Knowledge map Save
    To meet the requirement of Internet and large scale text processing ,this paper introduces how to automatically extract subject from Chinese texts. We extract the subject from three different levels : subject word ,subject concept and subject sentence. We put the emphasis on how to form the weighting system and acquire the experience coefficient values. Based on the experimental results of news articles ,we briefly analyze the performance.
  • WU Hua,HUANG Tai-yi
    2001, 15(4): 29-35.
    Abstract ( ) PDF ( ) Knowledge map Save
    In a question-answering system ,if the system can get a view of the do main knowledge that the user masters ,it can generate answers both informative and understandable to the user , which can make the interaction between human and computer better.Based on the flower knowledge retrieval system ,this paper discusses the effect of the user model on the generated contents and the relationship between the user model and the text planner. Experiments show that the user model affects not only the generated contents but also the style of the generated contents. The generation system uses two generating strategies : schema and process. Combination of these two methods is also discussed in this paper.
  • HUANG Yin-fei,ZHENG Fang,YAN Peng-ju,XU Ming-xing,WU Wen-hu
    2001, 15(4): 36-41.
    Abstract ( ) PDF ( ) Knowledge map Save
    In this paper we present the design and the implementation of a Chinese spoken language dialogue system named EasyNav which is for Tsinghua University Campus Navigation. By analyzing the features and requirements of spoken language dialogue system , we design a rule-based language understanding procedure that is suitable to it . The syntactic parser applies the GLR algorithm to process the Context Free Grammar (CFG) , whose purpose is to extract features of syntactic structure for use by the semantic parser. The syntactic grammar is designed for the trade-off between the coverage and the accuracy. The semantic parser matches sentence templates with syntactic constraints , so as to find speaker′s intention. The semantic parser resolves the ambiguity induced by the syntactic parser. The advantage of the design is that the system can be easy to construct and extend.
  • WANG Hao-jun,ZHAO Nan-yuan,DENG Gang-yi
    2001, 15(4): 42-47,53.
    Abstract ( ) PDF ( ) Knowledge map Save
    A stroke segment extraction algorithm for Tibetan character is presented in this paper. Based on the geometrical features and topology structures of Tibetan character ,this method successfully utilizes contour information to extract stroke segments of Tibetan characters. First contour points are extracted by chain code following ,then feature points are detected and used to separate strokes ,finally contour lines are used to represent strokes instead of skeleton lines. Experimental results show that the proposed algorithm can correctly extract the strokes of printed Tibetan character on the human perception. In additional ,compared with methods based on thinning algorithm ,the proposed algorithm is more robust and faster.
  • LIU Jia-feng,HUANG Jian-hua,TANG Xiang-long
    2001, 15(4): 48-53.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper describes the design and implementation of an on-line Chinese Character recognition system , which is based on Hidden Markov Models1 The strokes of on-line Chinese character are regarded as the input observation sequence , and a multi-cross left-right model structure is employed in order to eliminate the influence caused by redundancy or loosing of strokes. The training of HMM models is also an important problem for this system , in order to avoid the training process falls into local minimum , an improved training approach is proposed. After sufficient training , this system gains an satisfying result for both ordinary writing characters and free-style writing characters.
  • WU Jian,SUN Yu-fang,LI Guo-hua,LI Xiang-kai
    2001, 15(4): 54-59.
    Abstract ( ) PDF ( ) Knowledge map Save
    With the application of computer being more and more deeply and Internet being more and more popular ,6763 Chinese characters defined in GB 2312 - 80 can not meet the needs. ISO 10646 standard provides a square-built code space for developing Chinese platform that supports large Chinese character set . We have studied the technique on implementing Chinese platforms. Our Chinese platform supports CJK large Chinese character set of ISO 10646 standard and multi-internal codes ,and it is compatible with present Chinese platform ,English version independence and follows the international and China national standard. This thesis describes the design goal ,module structure of this Chinese platform.
  • S·Soyoitu
    2001, 15(4): 60-66.
    Abstract ( ) PDF ( ) Knowledge map Save
    The author simulates various forms of construction mechanism of traditional Mongolian word and proposes some Mathematical models for whole word construction on computer. Based on these mathematical models of word construction the author caries out an investigation of accuracy , time computer ,space complexity etc ,three key elements for optimistic word construction theory on computer of traditional Mongolian writing language. It also gives a study on computational structure ,parallel knowledge processing method and unified computation of whole word complex characteristics ,which should be carefully considered in optimistic word construction process ,At last the authorproves that the mathematical model of“B - J - T = W”. In the optimum construction pattern for Mongolian word on computer.