1998 Volume 12 Issue 1 Published: 16 February 1998
  

  • Select all
    |
    Review
  • Review
    Zhu Jungbo; Zhang Yuejie; Yao Tianshun
    1998, 12(1): 2-9.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper present s a data2oriented syntactic parsing (DOP) technique. The data-oriented parsing(DOP) method was suggested by Scha in 1990. Data2oriented models of language processing embody the assumption that human language perception and production works with representations of concrete past language experiences ,rather than with abst ract grammar rules.The data-oriented syntacti parsing f ramework could be described by indication three components: (1) build marked corpus consisted of representations of past language experiences ; (2)ext ract f ragment unit s f rom marked corpus to const ruct parsing procedure of new utterance.(3) a definition of the way in which the probability of an analysis of new utterance. Data2ori2ented syntactic parsing model maintains large corpora of liguistic rapresentations of previously occurring utterances ,uses marked corpus as a grammar. When processing a new input utterance,analyses of this utterance are const ructed by combining f ragment untis f rom the corpus ;the occurrencef requencies of the fragment unit s are used to estimate which analysis is the most puobable one . This paper discusses marked corpus ,f ragment unit , combination parsing and probability model in detail.
  • Review
    Zong Chengquing ;Song Jin;Chen Zhaoxiong;Huang Heyan
    1998, 12(1): 10-17.
    Abstract ( ) PDF ( ) Knowledge map Save
     After summarizing some important research result s on rest ricted language ,this paper proposes a fomalization model for the rest ricted language ,and also discusses the linguistic characteristics and mathematical characteristics of the rest ricted language. The paper gives a method to determine the vocabulary and sentence patterns of a rest ricted language subset . The authors hope the model and the method presented in this paper evoke a heated discussion and propel the approach to the rest ricted language.
  • Review
    Liu Ting;Wu Yan;Wang Kaizhu
    1998, 12(1): 18-26.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper present s a software system on Chinese automatic word segmentation. The original text is scanned three times :first ,the text is cut into short Chinese characterst ring sequence by cut2marks ;second ,every short sting is weighted by it s f requency in context ,and the short st rings weighted heavy are regarded as candidate words ; third ,short st rings are segmented by candidate word set and everyday words. Experiment s result s shows that the segmentation precision of this word segmentation system is aboue 1.5%,and a arge part of new words can be recognized correctly. This system is very suitable to document ret rieval and other areas.
  • Review
    ZHANG Shuwu;HUANG Taiyi
    1998, 12(1): 36-42.
    Abstract ( ) PDF ( ) Knowledge map Save
    As a major statistical model ,n2gram has been applied extensively in the process of language processing (such as POS tagging ,language modeling of speech recognition ,character recognition ,etc. ) . However ,there is no definitive conclusion what N value will be optimal for Chinese language processing until now. This paper int roduces a kind of estimation for the selection of parameter N in ngram model in Chinese language. Three factors has been analyzed for comparing different N value. These are the approximate expression for Chinese grammatical st ructure , reconst ruction of new words ,and the performance for the t ranscription of Chinese Pinyin sequence to text . Finally , a conclusion was obtained that 4 is a better selection of parameter N value for n2gram model based on words in Chinese language. It will be helpful for the development of Chinese statistical language model and language processing.
  • Review
    You Rongyan
    1998, 12(1): 43-50.
    Abstract ( ) PDF ( ) Knowledge map Save
    In this paper , the lower Limit of the sample size in f requency statistics of chinese characters is given by using the method named error estimate under the conditions of giving the error bounds and the confident probability , a new statistical method of frequencies of chinese characters is presented in which the defined statisfical f requency of a chinese character is unbiased and has a smaller variance than before and therefore is a preciser estimate of utility frequency of the chinese character.
  • Review
    Hua Qiang
    1998, 12(1): 51-57.
    Abstract ( ) PDF ( ) Knowledge map Save
    According to the characteristics of Chinese ,LZSS has been modified in it s modeling and coding ,in adaptive index bit s extension,and in the greatest index bit s to get LZSSCH. The average compression ratio of LZSSCH to Chinese tect is about 8 % higher than LZSS's.Both compressing and expanding speed ,the sizes of executable program are alike. Besides ,any preconditioning is unnecessary ,and this method can be used to compress other non -alphabetic writing text files too.
  • Review
    Ke Jing|Qiao Yi Zheng
    1998, 12(1): 58-64.
    Abstract ( ) PDF ( ) Knowledge map Save
    A local elastic matching method for on2line Chinese signature verification is presented in this paper. The signature is segmented into st rings. The st rings including their features are regarded as primitives ,After primitive ext raction ,according to some relatively simple static features ,the dynamic programming method is used to find out the optimal corresponding relation between the primitives of the test signature and the reference signature. According to the optimal corresponding relation , the local elastic matching method is used to compare the primitives of the test signature with those of the reference signature ,taking both the static and the dynamic features into account . For a test data set containing 680 signature samples ,average correct recognition rate of 92. 6 % is obtained. The average recognition time is 0. 9 second wtih a 486/ 25 microcomputer. The experiment result s reveal , the feasibility of the proposed method.