Journal of Chinese Information Processing

Select

Review

Data-Oriented Syntactic Parsing

Zhu Jungbo; Zhang Yuejie; Yao Tianshun

1998, 12(1): 2-9.

Abstract ( ) PDF ( )

Knowledge map

Save

This paper present s a data2oriented syntactic parsing (DOP) technique. The data-oriented parsing(DOP) method was suggested by Scha in 1990. Data2oriented models of language processing embody the assumption that human language perception and production works with representations of concrete past language experiences ,rather than with abst ract grammar rules.The data-oriented syntacti parsing f ramework could be described by indication three components: (1) build marked corpus consisted of representations of past language experiences ; (2)ext ract f ragment unit s f rom marked corpus to const ruct parsing procedure of new utterance.(3) a definition of the way in which the probability of an analysis of new utterance. Data2ori2ented syntactic parsing model maintains large corpora of liguistic rapresentations of previously occurring utterances ,uses marked corpus as a grammar. When processing a new input utterance,analyses of this utterance are const ructed by combining f ragment untis f rom the corpus ;the occurrencef requencies of the fragment unit s are used to estimate which analysis is the most puobable one . This paper discusses marked corpus ,f ragment unit , combination parsing and probability model in detail.

Select

Review

The Theoretical Research and Probe into Restricted Language

Zong Chengquing ;Song Jin;Chen Zhaoxiong;Huang Heyan

1998, 12(1): 10-17.

Abstract ( ) PDF ( )

Knowledge map

Save

　After summarizing some important research result s on rest ricted language ,this paper proposes a fomalization model for the rest ricted language ,and also discusses the linguistic characteristics and mathematical characteristics of the rest ricted language. The paper gives a method to determine the vocabulary and sentence patterns of a rest ricted language subset . The authors hope the model and the method presented in this paper evoke a heated discussion and propel the approach to the rest ricted language.

Select

Review

An Chinese Word Automatic Segmentation System Based on String Frequency Statistics Combined with Word Matching

Liu Ting;Wu Yan;Wang Kaizhu

1998, 12(1): 18-26.

Abstract ( ) PDF ( )

Knowledge map

Save

This paper present s a software system on Chinese automatic word segmentation. The original text is scanned three times :first ,the text is cut into short Chinese characterst ring sequence by cut2marks ;second ,every short sting is weighted by it s f requency in context ,and the short st rings weighted heavy are regarded as candidate words ; third ,short st rings are segmented by candidate word set and everyday words. Experiment s result s shows that the segmentation precision of this word segmentation system is aboue 1.5%,and a arge part of new words can be recognized correctly. This system is very suitable to document ret rieval and other areas.

Select

Review

A Study of The Value of Parameter N in ngram Statistical Model in Chinese Language

ZHANG Shuwu;HUANG Taiyi

1998, 12(1): 36-42.

Abstract ( ) PDF ( )

Knowledge map

Save

As a major statistical model ,n2gram has been applied extensively in the process of language processing (such as POS tagging ,language modeling of speech recognition ,character recognition ,etc. ) . However ,there is no definitive conclusion what N value will be optimal for Chinese language processing until now. This paper int roduces a kind of estimation for the selection of parameter N in ngram model in Chinese language. Three factors has been analyzed for comparing different N value. These are the approximate expression for Chinese grammatical st ructure , reconst ruction of new words ,and the performance for the t ranscription of Chinese Pinyin sequence to text . Finally , a conclusion was obtained that 4 is a better selection of parameter N value for n2gram model based on words in Chinese language. It will be helpful for the development of Chinese statistical language model and language processing.

Select

Review

A New Method for Frequency Statistics of Chinese Characters

You Rongyan

1998, 12(1): 43-50.

Abstract ( ) PDF ( )

Knowledge map

Save

In this paper , the lower Limit of the sample size in f requency statistics of chinese characters is given by using the method named error estimate under the conditions of giving the error bounds and the confident probability , a new statistical method of frequencies of chinese characters is presented in which the defined statisfical f requency of a chinese character is unbiased and has a smaller variance than before and therefore is a preciser estimate of utility frequency of the chinese character.

Select

Review

THE LZSSCHDATA COMPRESSION ALGORITHM FOR CHINESE TEXT FILES

Hua Qiang

1998, 12(1): 51-57.

Abstract ( ) PDF ( )

Knowledge map

Save

According to the characteristics of Chinese ,LZSS has been modified in it s modeling and coding ,in adaptive index bit s extension,and in the greatest index bit s to get LZSSCH. The average compression ratio of LZSSCH to Chinese tect is about 8 % higher than LZSS's.Both compressing and expanding speed ,the sizes of executable program are alike. Besides ,any preconditioning is unnecessary ,and this method can be used to compress other non -alphabetic writing text files too.

Select

Review

A Local Elastic Matching Method for On-line Chinese Signature Verif ication

Ke Jing|Qiao Yi Zheng

1998, 12(1): 58-64.

Abstract ( ) PDF ( )

Knowledge map

Save

A local elastic matching method for on2line Chinese signature verification is presented in this paper. The signature is segmented into st rings. The st rings including their features are regarded as primitives ,After primitive ext raction ,according to some relatively simple static features ,the dynamic programming method is used to find out the optimal corresponding relation between the primitives of the test signature and the reference signature. According to the optimal corresponding relation , the local elastic matching method is used to compare the primitives of the test signature with those of the reference signature ,taking both the static and the dynamic features into account . For a test data set containing 680 signature samples ,average correct recognition rate of 92. 6 % is obtained. The average recognition time is 0. 9 second wtih a 486/ 25 microcomputer. The experiment result s reveal , the feasibility of the proposed method.

Please choose a citation manager

Content to export

1998 Volume 12 Issue 1 Published: 16 February 1998