Journal of Chinese Information Processing

Select

Chinese Spoken Language Analyzing Facing the Middle Semantic Representation

XIE Guo-dong,ZONG Cheng-qing,XU Bo

2003, 17(1): 1-6.

Abstract ( ) PDF ( )

Knowledge map

Save

Spoken language analyzing is a crucial part in human-machine dialog system and spoken language translation system. In this paper, we present a Chinese spoken language analyzing method based on the combination of statistical and rule methods. The analyzing result is a middle semantic representation. It has two stages ,first ,use the statistical method to analyzing the semantic information ,then use the rule method to map the semantic information to the middle semantic representation. This method avoids the shortcoming of the rule and has high robustness ,at the same time it achieves a lower error rate.

Select

Research of Multi Agent in MTS from Japanese to Chinese

ZHANG Jie,CHEN Qun-xiu

2003, 17(1): 7-12.

Abstract ( ) PDF ( )

Knowledge map

Save

How to improve the quality of the translation result is a difficult problem in research and development of machine translation system. In this paper we will discuss how to add multi agent architecture to machine translation system with multi translation method to improve the quality of the translation result . And we present a project-like architecture with multi level and a billboard correspond strategy. This architecture is used in our multi method machine translation system from Japanese to Chinese ,and we get a much better translation result .

Select

Alignment of Single Source Words and Target Multi-word Units from Parallel Corpus

CHEN Bo-xing,DU Li-min

2003, 17(1): 13-19.

Abstract ( ) PDF ( )

Knowledge map

Save

Multi-word unit includes steady collocation ,multi-word phrase and multi-word term ,this paper we provide an algorithm for automatic alignment of single source words and target multi-word units from sentence aligned parallel spoken language corpus. Mutual information has been used to extract multi-word units by many other researchers ,but the retrieval results mainly depend on the identification of suitable bigrams for the initiation of the iterative process. This algorithm utilizes normalize mutual information difference and normalize t-scores difference between multi target words correspond to the same single source word to extract the multi-word units ,then utilizes the even mutual information and even t-score to align the single source words and target multi-word units. In this algorithm ,we have applied the Local Bests algorithm ,stopword filter and long-length units preference methods et al. The grading of the lexicon can deduce the number of the incorrect entries in the high level lexicon effectively ,which makes the translation lexicon more practicably.

Select

Study on Link-based Approaches for Web IR in TREC Experiments

ZHANG Min,MA Shao-ping,GAO Jian-feng

2003, 17(1): 20-24,31.

Abstract ( ) PDF ( )

Knowledge map

Save

This paper studied the effects of using of link information for Web IR in TREC experiment ,including link anchor text ,link structure and the combination of link-based retrieval and traditional content-based retrieval. Several conclusions are drawn :Firstly ,anchor text can represent precisely the topic of Web page ,but insufficient in describing the Web page content . Secondly ,comparing with traditional content-based IR technique ,using link-based approach on homepage finding task can get more than 96% improvement ,while it is not helpful on ad hoc task. Finally ,combining link-based and content-based techniques makes consistent 48% to 124.8% improvement on homepage finding task and some progress on ad hoc task.

Select

Focused Crawling Based on Genetic Algorithm

XU Huan-qing,WANG Yong-cheng,SUN Qiang

2003, 17(1): 25-31.

Abstract ( ) PDF ( )

Knowledge map

Save

The exponential growth of information available on the WWW makes it increasingly difficult to crawl and index the entire internet for general-purpose crawlers. Rather than collecting and indexing all accessible web documents to answer all possible ad-hoc queries ,focused crawler analyzes its crawl boundary to find the links that are likely to be most relevant for the crawl ,and avoids irrelevant regions of the Web. In this paper ,a new focused crawling approach based on Generic Algorithm is proposed. The method electively seeks out pages that are relevant to a predefined set of topics using Generic Algorithm ,increases the crawling chance of the web page following the web page with the low content-relevance , and broadens the relevant-searching scope of crawlers. Meanwhile , the hyperlink metadata is used to predict the topic-relevance of the web page pointed and quickens the information crawling. Experimental results indicate that our approach has better performance.

Select

Near-synonym Distinction in Mandarin — A Case Study of the Verbs of‘Hang’

WU Xin-da

2003, 17(1): 32-37.

Abstract ( ) PDF ( )

Knowledge map

Save

This study aims to distiguish three near-synonymous verbs of hanging in Mandarin ,namely‘xuan’‘, gua’,and‘diao’. Using the Sinica Corpus ,this study analyzes the grammatical distribution and tries to filter out the key semantic componet that sets the three verbs apart . It is found that the conecpt of‘event focus’plays a significant role in the distiction of the three verbs. The analysis reveals that all three verbs can be represented by causative construction. The distiction lies in the fact that these three verbs place their focus on different components of the causative construction.‘Xuan’tends to focus on the subevent of the causative construction,‘gua’either part ,while‘diao’the superevent . This study shows that the concept of‘event focus’has its place in the disticiton of near-synonyms.

Select

Syntactic-semantic Analysis of“cai”Sentences

WANG Nan

2003, 17(1): 38-45.

Abstract ( ) PDF ( )

Knowledge map

Save

In the light of the basic type of“cai (才) ”sentences ,this article investigates the syntactic syntagmatic function of the adverb“cai (才) ”,and emphatically analyses the four kinds of basic grammatical meanings of“cai (才) ”,pointing out that the four kinds of grammatical meanings can be merged into two —“the quantity of things”and“to limit/to exclude”. On the basis of this article induces the deep grammatical meaning of the adverb“cai (才) ”—the tendentious judgement that the speaker forms after he compares the objective reality with his subjective standard.

Select

On Construction of a Chinese Corpus Based on Semantic Dependency Relations

YOU Fang,LI Juan-zi,WANG Zuo-ying

2003, 17(1): 46-53.

Abstract ( ) PDF ( )

Knowledge map

Save

Corpora are important resources for knowledge acquisition in the field of natural language processing. For the purpose of sentence understanding ,we are constructing a Chinese large-scale-corpus based on semantic dependency relations. This paper introduces the tagging formalisms we adopt ,the tagging set we choose ,the tagging tool we develop ,and the method we use to guarantee the good consistency of tagging. The corpus under discussion is at a scale of 1 million words. Each sentence in the corpus ,which already had annotations of sense ,is further tagged with its semantic structure using 70 semantic-dependency-relations. The highlight of this corpus is its ability to effectively describe various relations between Chinese words. All of these profited from using < HowNet > for reference and the combination with specific use of language. The construction of this corpus can definitely provide more knowledge supports for sentence understanding ,content-based information retrieval ,and so on.

Select

Language Model Topic-adaptation Using Gradient Project Algorithm

SU Tao,WANG Jun-jie,SUN Jia-song,WANG Zuo-ying

2003, 17(1): 54-59.

Abstract ( ) PDF ( )

Knowledge map

Save

In this paper the problem of adapting language model automatically according to the topic-dependence of recognition task ,that is language mdel topic-adaptation ,is studied. A method is proposed to implement the linear interpolation of several topic language models based on the rule of maximum likelihood estimation by using Gradient Project (GP) algorithm. This method shows an effective improvement in terms of word right rate and robustness in the experiments ,especially for the recognition task with definite topic. At the same time ,the strategy of multi-pass is adopted in the processing of pinyin-to-character conversion in order to solve the problem that the recognition of the new system is slow. In comparison to the baseline system ,the word right rate is increasing apparently while the efficiency is as much as that is in baseline system.

Select

An Kind of Chinese Text Strings’Similarity and its Application in Speech Recognition

LI Hong-lian,HE Wei,YUAN Bao-zong

2003, 17(1): 60-64.

Abstract ( ) PDF ( )

Knowledge map

Save

It becomes more difficult to improve the accuracy of general speech engine.But in some cases we may obtain ideal accuracy using context knowledge. If speech input is one element of a finite set ,we can improve the accuracy greatly using similarity of Chinese text strings. In this paper we present an perfect definition of Chinese text strings’similarity ,and do some research on its application in speech recognition.

Please choose a citation manager

Content to export

2003 Volume 17 Issue 1 Published: 14 February 2003