Journal of Chinese Information Processing

Select

Survey and Prospect

SUN Maosong, LIU Ting, JI Donghong, SUI Zhifang, ZHAO Jun, ZHANG Bo, WUSHOUER Silamu,
YU Shiwen, ZHU Jun, LI Jianmin, LIU Yang, WANG Houfeng, TURGUN Ibrahim, LIU Qun, LIU Zhiyuan

2014, 28(1): 1-8.

Abstract ( ) PDF ( )

Knowledge map

Save

This paper surveys research frontiers of language computing in the context of Web-scale text information processing, covering the perspectives of fundamental computational model, language analysis algorithm, linguistic resource construction, machine translation, content understanding as well as question and answering. Several related key issues are discussed, and their significance to Chinese information processing in the near future is also addressed.

Select

Survey and Prospect

Research and Progress in the Language Network

HAN Pu,WANG Dongbo,LU Gaofei,SU Xinning

2014, 28(1): 9-18.

Abstract ( ) PDF ( )

Knowledge map

Save

As a new research field, the study of language network is developing rapidly. It has aroused great attention from researchers in different areas. Firstly, a briefly introduction is delivered to illustrate the characteristics of language network, statistical properties and the related network models. Secondly, based on the composition of language and the hot topic of network, language network is divided into phonetic network, co-occurrence network, syntactic dependency network, semantic and concept network. Besides, the main research content of language network are described in detail. Finally, the drawbacks and advantages of language network study are summarized.

Select

Survey and Prospect

Supervised WSD Model Optimization Based on Language Model

YANG Zhizhuo, HUANG Heyan

2014, 28(1): 19-25.

Abstract ( ) PDF ( )

Knowledge map

Save

Word Sense Disambiguation (WSD) is one of the key issues in natural language processing. Currently, supervised WSD method is an effective way to solve the problem. However, because of the lack of large-scale training data, supervised methods cannot achieve satisfactory results. This paper presents a word sense disambiguation optimization model based on statistical language model, which exploits language model to optimize traditional supervised WSD model. The new model derives the meaning of ambiguous words by taking advantage of the knowledge contained in training data and language model. The model can significantly improve WSD performance when the training data is insufficient. Experimental results show that the optimized model outperformed the best participating system in the SemEval-2007: task #5 evaluation.

Select

Survey and Prospect

Multiple Semantic Features Based Coreference Resolution in Emergency News

PANG Ning , YANG Erhong

2014, 28(1): 26-32.

Abstract ( ) PDF ( )

Knowledge map

Save

The key to improving the emergency response ability lies in collecting and extracting the useful information from the relevant news reports effectively. The coreference resolution is an important subtask for this purpose. In the paper, we present an approach to coreference resolution based on Maximum Entropy Model for Chinese news reports about sudden events. We exploit the semantic class features, the semantic role features, as well as the semantic related features, the redirection features and context features extracted from the Wikipedia. The experimental results confirm the positive effect to the coreference resolution by adding selected semantic knowledge except pure semantic role which can make the system F value dropped by 1.31%.

Select

Information Retrieval and Social Computing

Weighted-Tau Rank: a Ranking-Oriented Algorithm for Collaborative Filtering

SUN Jiankai, WANG Shuaiqiang, MA Jun

2014, 28(1): 33-40.

Abstract ( ) PDF ( )

Knowledge map

Save

Most ranking-oriented collaborative filtering (CF) algorithms have two limitations. Firstly, they only consider the accordance of user preferences but ignore the degrees and popularities of user preferences when computing user similarities. Secondly, an intermediate step is necessary to formulate the value function, for preference prediction and aggregation in greedy algorithms to derive recommendation lists. To address these problems, we propose a Degree-Popularity weighting scheme integrating TF-IDF to weight the degrees and popularities of user pairwise relative preferences, and compute similarities between users based on weighted Kendall Tau rank correlation coefficient. Preference aggregations and predictions are directly formulated and the recommendation lists are consequently derived by applying the Schulze method. We conduct extensive experiments on two movie datasets under NDCG evaluation, implying advantageous results in comparison with the state-of-the-art CF algorithms.

Select

Information Retrieval and Social Computing

An Algorithm for Subtopic Detecting Based on Absorbing Markov Chain

WEI Mingchuan, ZHU Junjie, ZHANG Jin, ZHANG Kai, CHENG Xueqi , REN Yan

2014, 28(1): 41-46.

Abstract ( ) PDF ( )

Knowledge map

Save

Due to such natures as content diversity, dynamic evolution ,and so on, its difficult to get high quality subtopics for web texts and topics by traditional topic detection and tracking models. An algorithm of subtopic partition based on absorbing Markov chain is proposed to address this issue. The algorithm firstly gathers the topic keywords clustered by the web pages to generate subtopics, then derived subtopics based on the absorbing Markov chain. The experimental results show that the algorithm performs well in terms of both significance and diversity.

Select

Information Retrieval and Social Computing

Comparing Topics from Microblog and News Media about Specific Events

ZHOU Zhenyu, LI Fang

2014, 28(1): 47-55.

Abstract ( ) PDF ( )

Knowledge map

Save

This work conducts a contrastive study on the topics of specific events from microblog and news media. Firstly, we use LDA to extract topics from the two media, and then define three indexes: attention factor, diversity factor and evolution factor for an improved topic discrepancy calculation. Finally, we chose four events of different types for experiments and analysis. The results show: 1) There are more comment topics appearing on microblog with close attention factors in contrast to a high proportion of factual topics with varied attention factors in the news media. 2) In both microblog and news media, diversity factor of words used in the comment topics is bigger than that in factual topics. 3) In microblog, comment topics last longer with consistent contents, while the factual topics does so in the news media.

Select

Information Retrieval and Social Computing

Community Mining in Social Network

FAN Chao, WANG Houfeng

2014, 28(1): 56-63.

Abstract ( ) PDF ( )

Knowledge map

Save

Social Network is a new medium of exchanging information on line. Take Renren.com as an example, a myriad of young people, especially students, talk about interesting topics on this platform. People are connected for many reasons, such as studying in same college, working in same company, having interest in common. And the network nodes in Renren.com are probably joined together in groups according to the property of users department or school. In this article, the real-world network data is collected from Renren.com in the first place, and then the CNM algorithm is utilized to validate assumptions mentioned above. Based on the structure of Social Network, an improved method for discovering community structure is proposed, which outperforms the CNM in terms of accuracy. The community structure detected in the social network shows the different characteristics of each department or school in college.

Select

Information Retrieval and Social Computing

Query Recommendation Based on User Intent Recognition

LUO Cheng, LIU Yiqun, ZHANG Min, MA Shaoping, RU Liyun, ZHANG Kuo

2014, 28(1): 64-72.

Abstract ( ) PDF ( )

Knowledge map

Save

The effectiveness of information retrieval from the web largely depends on whether users can properly describe their information needs in the queries issue to the search engines. However, many search queries are short, ambiguous or even noisy. Query recommendation technology help users refine their queries and describe the information needs clearly. In order to obtain high quality query recommendations, query candidates are at first generated with a random walk strategy adopted on Query-URL bipartite graph. Snippet click behavior information is then adopted to re-rank the candidate lists infavor of the queries representing user intents. Learning based algorithms are finally utilized to reduce the possible noises in recommendations. Experiment on practical search user behavior data shows the effectiveness of the proposed method.

Select

Information Retrieval and Social Computing

Analysis into the Relationship Between Search Engine User Behavior and User Satisfaction Evaluation

LIU Jian, LIU Yiqun, MA Shaoping, ZHANG Min, RU Liyun, ZHANG Kuo

2014, 28(1): 73-79.

Abstract ( ) PDF ( )

Knowledge map

Save

As an important category of traditional work in search engine evaluation, user satisfaction evaluation has many differences from traditional relevance measurement evaluation. User satisfaction is a more user-centered evaluation, providing a global and systematic evaluation to the performance of search engine. This paper describes the relationship between search engine user behavior and user satisfaction evaluation. We designs an experiment with the premise of avoiding impacting user searching experiences, through which we collected query-level explicit judgments of user satisfaction and user behavior log, then analyzes the collected data to elicit valuable conclusions. The findings provide insights into the improvement of the performance of search engine and the amelioration of user searching experiences.

Select

Information Retrieval and Social Computing

An Effective On-line Data Process and Service Framework Based on Memory

LIN Xianghui, ZHANG Jin, HUANG Kangping, XU Lei, XU Hongbo, CHENG Xueqi, CHENG Gong

2014, 28(1): 80-86.

Abstract ( ) PDF ( )

Knowledge map

Save

Under the environment of big data, traditional database-centered data processing architecture cannot meet the requirement of high concurrency of read/write requests. At the same time, serial usages of data limit the effectiveness of data processing. This paper describes an effective on-line data process and service framework based on memory. This framework uses multi-index data access method and pub/sub data control mechanism to improve the effectiveness of data processing while reducing the interaction with the database. Experimental results show that the memory based on-line data process and service framework can significantly improve the response speed of database and shorten the latency of data processing.

Select

Machine Translation

An Ensemble Learning Method for Maximum Entropy Based Phrase Reordering Model

HE Zhonghao, SU Jinsong, SHI Xiaodong, CHEN Yidong, HUANG Yanzhou

2014, 28(1): 87-93.

Abstract ( ) PDF ( )

Knowledge map

Save

The Maximum Entropy Based BTG model becomes a hot topic in statistical machine translation in recent years due to its strong translation and easy-to-train abilities. However, the distribution of reordering examples in this model is imbalanced. To solve this problem, we introduce an ensemble learning method for training phrase reordering model. Experimental results show that,the reordering model can reach a better training effect via our method and the performance of the translation system is improved significantly in a large-scale dataset.

Select

Machine Translation

A Contrastive Study on the Interpretations to Common Chinese Verbs-Taking the Dictionary of Contemporary Chinese and the Dictionary of Mandarin (Revised Edition) as Examples

LIU Jun, XU Dekuan, MA Mengjia, CHEN Shumei

2014, 28(1): 94-99.

Abstract ( ) PDF ( )

Knowledge map

Save

Verb has very significant position in language. In Chinese, verb occupy an important position. Comprehending the interpretation to the verb is the major approach to studying verbs. This paper focuses on the verbs in HSK vocabulary and compares their interpretations in the Dictionary of Contemporary Chinese and the Chinese Dictionary. It presents the differences in these interpretations, aiming at reducing semantic misunderstanding and promoting communication Cross-Straits.

Select

Minority Language Information Processing

Acoustic Feature Analysis of the Uyghur Phonemes

WANG Hui, NURMEMET Yolwas, WUSHOUER Silamu

2014, 28(1): 100-106.

Abstract ( ) PDF ( )

Knowledge map

Save

Employing manually labeled continuous speech sentences, this paper conducts an analysis of each Uyghur phonemes formant frequency, duration and intensity by the classification of speech rate. To study Uyghur plosive and affricate, the paper makes the analysis of their acoustic feature under the structure of consonant-vowel. Feature fusion and the change of models state number are applied to validate the influences of different acoustic features to the Uyghur phoneme recognition. It also describes an improved acoustic model with a higher recognition rate. Meanwhile, the analysis of confusing phonemes provides a reference for the further improvement of Uyghur acoustic models.

Select

Minority Language Information Processing

A Hybrid Approach to Tibetan Person Name Identification by Maximum Entropy Model and Conditional Random Fields

JIA Yangji,LI Yachao,ZONG Chengqing,YU Hongzhi

2014, 28(1): 107-112.

Abstract ( ) PDF ( )

Knowledge map

Save

Tibetan person name recognition is one of the most difficult tasks in the area of Tibetan information processing, with a direct impact on the precision of Tibetan word segmentation. Based on the analysis of wording rules and features of Tibetan names, this paper proposes a method combining maximum entropy and conditional random fields to identify Tibetan person names. The experiment shows that this approach works significant well reaching 93.08% in F1-measure.

Select

Minority Language Information Processing

Research on Automatic Identification of Tibetan Function Word

GAO Dingguo , Tashigyal, ZHAO Dongcai

2014, 28(1): 113-117.

Abstract ( ) PDF ( )

Knowledge map

Save

Research on the Tibetan function word is essential to the research on words, sentences and semantics in the Tibetan information processing. The automatic idetification of Tibetan function word paves the way for further research on the Tibetan function word. This paper discusses the role and use of Tibetan function words, reveals the difficulties in automatic identification of Tibetan function word, and finaly proposes a method for the automatic identification of Tibetan function word. The experiment on 2525 sentences achieves an accuracy of 97.0768% for this method.

Select

Minority Language Information Processing

Research on Uyghur Webpage and Implementation of the Uyghur Browser on Android Platform

DENG Jun,WUSHOUER Silamu,ANIWANR Tohti,YUAN Tinglei, ZHAO Zhicheng

2014, 28(1): 118-124.

Abstract ( ) PDF ( )

Knowledge map

Save

To customize the browser functionality by the second amendment to WebKit core is a popular solution to current development of embedded applications. Focusing on the WebKit engine on Android, this paper makes a comprehensive analysis of the problems of several browsers during Uyghur webpage access and reveals the causes for the abnormal display the Uyghur webpage. According to the characteristics of Uyghur this paper further designs the architecture of uyghur browser, and implements a Uyghur browser on Android platform through a technique of redering the Uygher in the application layer.

Please choose a citation manager

Content to export

2014 Volume 28 Issue 1 Published: 07 January 2014