Journal of Chinese Information Processing

Select

Robust Learning Algorithm

LIU Ying

2001, 15(4): 2-7.

Abstract ( ) PDF ( )

Knowledge map

Save

Disambiguities of part-of-speech tagging , syntactic and semantic analysis are disambiguted using statistical method. Maximal likelihood principle is used for disambiguting , but it is not all right under all conditions. Robust learning algorithm is used in this paper in order to acquire the right result among all candidates. When score of the right candidate is not maximal , it can be adjusted using robust learning algorithm , thus score of the right candidate is maximal and score of the wrong candidate is reduced. Moreover , there is difference between training set and test set , wrong rate of training set is minimal but wrong rate of test set is not minimal. When there is statistical difference between training set and test set , robust learning algorithm should be used.

Select

Studies on Er-model-based Restrictive-Chinese Query Language of Database

CUI Zong-jun,TANG Shi-wei,YANG Dong-qing

2001, 15(4): 8-14.

Abstract ( ) PDF ( )

Knowledge map

Save

A computational model of ER-model-based restrictive-Chinese query language of relational database is put forward which simulates the langguage process mechanism of human and the process of communicating in natural language is divided into four mutually dependant and interlaced steps : word segmentation , parsing , semantic processing and SQL transformation. A new grammar ,GWERSC(Grammar with ER Semantic Characteristics) is introduced ,which could contribute to syntactic parsing and simplify semantic understanding with the help of its embedded ER model.

Select

The Optimization of Full Text Retrieval System Based on Indexing of Single Chinese Character

YU Hai-yan,ZHANG Zhong-yi

2001, 15(4): 15-20,28.

Abstract ( ) PDF ( )

Knowledge map

Save

This paper discusses the optimization of full text retrieval system based on "indexing of single Chinese character" from three aspects : the compression of inverted index file using Golomb coding method , the bidirectional binary-search intersection algorithm , the technique of parallel computing and double-buffer cache. The experiment shows that these optimizations introduce the less storage spending and higher performance to the system.

Select

Extract Subject from Chinese Text with Three Different Levels

HAN Ke-song,WANG Yong-cheng,SHEN Zhou,WU Fang-fang

2001, 15(4): 21-28.

Abstract ( ) PDF ( )

Knowledge map

Save

To meet the requirement of Internet and large scale text processing ,this paper introduces how to automatically extract subject from Chinese texts. We extract the subject from three different levels : subject word ,subject concept and subject sentence. We put the emphasis on how to form the weighting system and acquire the experience coefficient values. Based on the experimental results of news articles ,we briefly analyze the performance.

Select

User Modeling and Text Planning in a Question-answering Generation System

WU Hua,HUANG Tai-yi

2001, 15(4): 29-35.

Abstract ( ) PDF ( )

Knowledge map

Save

In a question-answering system ,if the system can get a view of the do main knowledge that the user masters ,it can generate answers both informative and understandable to the user , which can make the interaction between human and computer better.Based on the flower knowledge retrieval system ,this paper discusses the effect of the user model on the generated contents and the relationship between the user model and the text planner. Experiments show that the user model affects not only the generated contents but also the style of the generated contents. The generation system uses two generating strategies : schema and process. Combination of these two methods is also discussed in this paper.

Select

The Design and Implementation of Campus Navigation System: EasyNav

HUANG Yin-fei,ZHENG Fang,YAN Peng-ju,XU Ming-xing,WU Wen-hu

2001, 15(4): 36-41.

Abstract ( ) PDF ( )

Knowledge map

Save

In this paper we present the design and the implementation of a Chinese spoken language dialogue system named EasyNav which is for Tsinghua University Campus Navigation. By analyzing the features and requirements of spoken language dialogue system , we design a rule-based language understanding procedure that is suitable to it . The syntactic parser applies the GLR algorithm to process the Context Free Grammar (CFG) , whose purpose is to extract features of syntactic structure for use by the semantic parser. The syntactic grammar is designed for the trade-off between the coverage and the accuracy. The semantic parser matches sentence templates with syntactic constraints , so as to find speaker′s intention. The semantic parser resolves the ambiguity induced by the syntactic parser. The advantage of the design is that the system can be easy to construct and extend.

Select

A Stroke Segment Extraction Algorithm for Tibetan Character Recognition

WANG Hao-jun,ZHAO Nan-yuan,DENG Gang-yi

2001, 15(4): 42-47,53.

Abstract ( ) PDF ( )

Knowledge map

Save

A stroke segment extraction algorithm for Tibetan character is presented in this paper. Based on the geometrical features and topology structures of Tibetan character ,this method successfully utilizes contour information to extract stroke segments of Tibetan characters. First contour points are extracted by chain code following ,then feature points are detected and used to separate strokes ,finally contour lines are used to represent strokes instead of skeleton lines. Experimental results show that the proposed algorithm can correctly extract the strokes of printed Tibetan character on the human perception. In additional ,compared with methods based on thinning algorithm ,the proposed algorithm is more robust and faster.

Select

A HMM Based On-line Chinese Character Recognition System and Improved Training Algorithm

LIU Jia-feng,HUANG Jian-hua,TANG Xiang-long

2001, 15(4): 48-53.

Abstract ( ) PDF ( )

Knowledge map

Save

This paper describes the design and implementation of an on-line Chinese Character recognition system , which is based on Hidden Markov Models1 The strokes of on-line Chinese character are regarded as the input observation sequence , and a multi-cross left-right model structure is employed in order to eliminate the influence caused by redundancy or loosing of strokes. The training of HMM models is also an important problem for this system , in order to avoid the training process falls into local minimum , an improved training approach is proposed. After sufficient training , this system gains an satisfying result for both ordinary writing characters and free-style writing characters.

Select

The Architecture Design for YanHuang Chinese Platform

WU Jian,SUN Yu-fang,LI Guo-hua,LI Xiang-kai

2001, 15(4): 54-59.

Abstract ( ) PDF ( )

Knowledge map

Save

With the application of computer being more and more deeply and Internet being more and more popular ,6763 Chinese characters defined in GB 2312 - 80 can not meet the needs. ISO 10646 standard provides a square-built code space for developing Chinese platform that supports large Chinese character set . We have studied the technique on implementing Chinese platforms. Our Chinese platform supports CJK large Chinese character set of ISO 10646 standard and multi-internal codes ,and it is compatible with present Chinese platform ,English version independence and follows the international and China national standard. This thesis describes the design goal ,module structure of this Chinese platform.

Select

Study on Word-wholly-forming Theory of Mongolian Language on Computer

S·Soyoitu

2001, 15(4): 60-66.

Abstract ( ) PDF ( )

Knowledge map

Save

The author simulates various forms of construction mechanism of traditional Mongolian word and proposes some Mathematical models for whole word construction on computer. Based on these mathematical models of word construction the author caries out an investigation of accuracy , time computer ,space complexity etc ,three key elements for optimistic word construction theory on computer of traditional Mongolian writing language. It also gives a study on computational structure ,parallel knowledge processing method and unified computation of whole word complex characteristics ,which should be carefully considered in optimistic word construction process ,At last the authorproves that the mathematical model of“B - J - T = W”. In the optimum construction pattern for Mongolian word on computer.

Please choose a citation manager

Content to export

2001 Volume 15 Issue 4 Published: 15 August 2001