2014 Volume 28 Issue 1 Published: 07 January 2014
  

  • Select all
    |
    Survey and Prospect
  • Survey and Prospect
    SUN Maosong, LIU Ting, JI Donghong, SUI Zhifang, ZHAO Jun, ZHANG Bo, WUSHOUER Silamu,
    YU Shiwen, ZHU Jun, LI Jianmin, LIU Yang, WANG Houfeng, TURGUN Ibrahim, LIU Qun, LIU Zhiyuan
    2014, 28(1): 1-8.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper surveys research frontiers of language computing in the context of Web-scale text information processing, covering the perspectives of fundamental computational model, language analysis algorithm, linguistic resource construction, machine translation, content understanding as well as question and answering. Several related key issues are discussed, and their significance to Chinese information processing in the near future is also addressed.
  • Survey and Prospect
    HAN Pu,WANG Dongbo,LU Gaofei,SU Xinning
    2014, 28(1): 9-18.
    Abstract ( ) PDF ( ) Knowledge map Save
    As a new research field, the study of language network is developing rapidly. It has aroused great attention from researchers in different areas. Firstly, a briefly introduction is delivered to illustrate the characteristics of language network, statistical properties and the related network models. Secondly, based on the composition of language and the hot topic of network, language network is divided into phonetic network, co-occurrence network, syntactic dependency network, semantic and concept network. Besides, the main research content of language network are described in detail. Finally, the drawbacks and advantages of language network study are summarized.
  • Survey and Prospect
    YANG Zhizhuo, HUANG Heyan
    2014, 28(1): 19-25.
    Abstract ( ) PDF ( ) Knowledge map Save
    Word Sense Disambiguation (WSD) is one of the key issues in natural language processing. Currently, supervised WSD method is an effective way to solve the problem. However, because of the lack of large-scale training data, supervised methods cannot achieve satisfactory results. This paper presents a word sense disambiguation optimization model based on statistical language model, which exploits language model to optimize traditional supervised WSD model. The new model derives the meaning of ambiguous words by taking advantage of the knowledge contained in training data and language model. The model can significantly improve WSD performance when the training data is insufficient. Experimental results show that the optimized model outperformed the best participating system in the SemEval-2007: task #5 evaluation.
  • Survey and Prospect
    PANG Ning , YANG Erhong
    2014, 28(1): 26-32.
    Abstract ( ) PDF ( ) Knowledge map Save
    The key to improving the emergency response ability lies in collecting and extracting the useful information from the relevant news reports effectively. The coreference resolution is an important subtask for this purpose. In the paper, we present an approach to coreference resolution based on Maximum Entropy Model for Chinese news reports about sudden events. We exploit the semantic class features, the semantic role features, as well as the semantic related features, the redirection features and context features extracted from the Wikipedia. The experimental results confirm the positive effect to the coreference resolution by adding selected semantic knowledge except pure semantic role which can make the system F value dropped by 1.31%.
  • Information Retrieval and Social Computing
  • Information Retrieval and Social Computing
    SUN Jiankai, WANG Shuaiqiang, MA Jun
    2014, 28(1): 33-40.
    Abstract ( ) PDF ( ) Knowledge map Save
    Most ranking-oriented collaborative filtering (CF) algorithms have two limitations. Firstly, they only consider the accordance of user preferences but ignore the degrees and popularities of user preferences when computing user similarities. Secondly, an intermediate step is necessary to formulate the value function, for preference prediction and aggregation in greedy algorithms to derive recommendation lists. To address these problems, we propose a Degree-Popularity weighting scheme integrating TF-IDF to weight the degrees and popularities of user pairwise relative preferences, and compute similarities between users based on weighted Kendall Tau rank correlation coefficient. Preference aggregations and predictions are directly formulated and the recommendation lists are consequently derived by applying the Schulze method. We conduct extensive experiments on two movie datasets under NDCG evaluation, implying advantageous results in comparison with the state-of-the-art CF algorithms.
  • Information Retrieval and Social Computing
    WEI Mingchuan, ZHU Junjie, ZHANG Jin, ZHANG Kai, CHENG Xueqi , REN Yan
    2014, 28(1): 41-46.
    Abstract ( ) PDF ( ) Knowledge map Save
    Due to such natures as content diversity, dynamic evolution ,and so on, its difficult to get high quality subtopics for web texts and topics by traditional topic detection and tracking models. An algorithm of subtopic partition based on absorbing Markov chain is proposed to address this issue. The algorithm firstly gathers the topic keywords clustered by the web pages to generate subtopics, then derived subtopics based on the absorbing Markov chain. The experimental results show that the algorithm performs well in terms of both significance and diversity.
  • Information Retrieval and Social Computing
    ZHOU Zhenyu, LI Fang
    2014, 28(1): 47-55.
    Abstract ( ) PDF ( ) Knowledge map Save
    This work conducts a contrastive study on the topics of specific events from microblog and news media. Firstly, we use LDA to extract topics from the two media, and then define three indexes: attention factor, diversity factor and evolution factor for an improved topic discrepancy calculation. Finally, we chose four events of different types for experiments and analysis. The results show: 1) There are more comment topics appearing on microblog with close attention factors in contrast to a high proportion of factual topics with varied attention factors in the news media. 2) In both microblog and news media, diversity factor of words used in the comment topics is bigger than that in factual topics. 3) In microblog, comment topics last longer with consistent contents, while the factual topics does so in the news media.
  • Information Retrieval and Social Computing
    FAN Chao, WANG Houfeng
    2014, 28(1): 56-63.
    Abstract ( ) PDF ( ) Knowledge map Save
    Social Network is a new medium of exchanging information on line. Take Renren.com as an example, a myriad of young people, especially students, talk about interesting topics on this platform. People are connected for many reasons, such as studying in same college, working in same company, having interest in common. And the network nodes in Renren.com are probably joined together in groups according to the property of users department or school. In this article, the real-world network data is collected from Renren.com in the first place, and then the CNM algorithm is utilized to validate assumptions mentioned above. Based on the structure of Social Network, an improved method for discovering community structure is proposed, which outperforms the CNM in terms of accuracy. The community structure detected in the social network shows the different characteristics of each department or school in college.
  • Information Retrieval and Social Computing
    LUO Cheng, LIU Yiqun, ZHANG Min, MA Shaoping, RU Liyun, ZHANG Kuo
    2014, 28(1): 64-72.
    Abstract ( ) PDF ( ) Knowledge map Save
    The effectiveness of information retrieval from the web largely depends on whether users can properly describe their information needs in the queries issue to the search engines. However, many search queries are short, ambiguous or even noisy. Query recommendation technology help users refine their queries and describe the information needs clearly. In order to obtain high quality query recommendations, query candidates are at first generated with a random walk strategy adopted on Query-URL bipartite graph. Snippet click behavior information is then adopted to re-rank the candidate lists infavor of the queries representing user intents. Learning based algorithms are finally utilized to reduce the possible noises in recommendations. Experiment on practical search user behavior data shows the effectiveness of the proposed method.
  • Information Retrieval and Social Computing
    LIU Jian, LIU Yiqun, MA Shaoping, ZHANG Min, RU Liyun, ZHANG Kuo
    2014, 28(1): 73-79.
    Abstract ( ) PDF ( ) Knowledge map Save
    As an important category of traditional work in search engine evaluation, user satisfaction evaluation has many differences from traditional relevance measurement evaluation. User satisfaction is a more user-centered evaluation, providing a global and systematic evaluation to the performance of search engine. This paper describes the relationship between search engine user behavior and user satisfaction evaluation. We designs an experiment with the premise of avoiding impacting user searching experiences, through which we collected query-level explicit judgments of user satisfaction and user behavior log, then analyzes the collected data to elicit valuable conclusions. The findings provide insights into the improvement of the performance of search engine and the amelioration of user searching experiences.
  • Information Retrieval and Social Computing
    LIN Xianghui, ZHANG Jin, HUANG Kangping, XU Lei, XU Hongbo, CHENG Xueqi, CHENG Gong
    2014, 28(1): 80-86.
    Abstract ( ) PDF ( ) Knowledge map Save
    Under the environment of big data, traditional database-centered data processing architecture cannot meet the requirement of high concurrency of read/write requests. At the same time, serial usages of data limit the effectiveness of data processing. This paper describes an effective on-line data process and service framework based on memory. This framework uses multi-index data access method and pub/sub data control mechanism to improve the effectiveness of data processing while reducing the interaction with the database. Experimental results show that the memory based on-line data process and service framework can significantly improve the response speed of database and shorten the latency of data processing.
  • Machine Translation
  • Machine Translation
    HE Zhonghao, SU Jinsong, SHI Xiaodong, CHEN Yidong, HUANG Yanzhou
    2014, 28(1): 87-93.
    Abstract ( ) PDF ( ) Knowledge map Save
    The Maximum Entropy Based BTG model becomes a hot topic in statistical machine translation in recent years due to its strong translation and easy-to-train abilities. However, the distribution of reordering examples in this model is imbalanced. To solve this problem, we introduce an ensemble learning method for training phrase reordering model. Experimental results show that,the reordering model can reach a better training effect via our method and the performance of the translation system is improved significantly in a large-scale dataset.
  • Machine Translation
    LIU Jun, XU Dekuan, MA Mengjia, CHEN Shumei
    2014, 28(1): 94-99.
    Abstract ( ) PDF ( ) Knowledge map Save
    Verb has very significant position in language. In Chinese, verb occupy an important position. Comprehending the interpretation to the verb is the major approach to studying verbs. This paper focuses on the verbs in HSK vocabulary and compares their interpretations in the Dictionary of Contemporary Chinese and the Chinese Dictionary. It presents the differences in these interpretations, aiming at reducing semantic misunderstanding and promoting communication Cross-Straits.
  • Minority Language Information Processing
  • Minority Language Information Processing
    WANG Hui, NURMEMET Yolwas, WUSHOUER Silamu
    2014, 28(1): 100-106.
    Abstract ( ) PDF ( ) Knowledge map Save
    Employing manually labeled continuous speech sentences, this paper conducts an analysis of each Uyghur phonemes formant frequency, duration and intensity by the classification of speech rate. To study Uyghur plosive and affricate, the paper makes the analysis of their acoustic feature under the structure of consonant-vowel. Feature fusion and the change of models state number are applied to validate the influences of different acoustic features to the Uyghur phoneme recognition. It also describes an improved acoustic model with a higher recognition rate. Meanwhile, the analysis of confusing phonemes provides a reference for the further improvement of Uyghur acoustic models.
  • Minority Language Information Processing
    JIA Yangji,LI Yachao,ZONG Chengqing,YU Hongzhi
    2014, 28(1): 107-112.
    Abstract ( ) PDF ( ) Knowledge map Save
    Tibetan person name recognition is one of the most difficult tasks in the area of Tibetan information processing, with a direct impact on the precision of Tibetan word segmentation. Based on the analysis of wording rules and features of Tibetan names, this paper proposes a method combining maximum entropy and conditional random fields to identify Tibetan person names. The experiment shows that this approach works significant well reaching 93.08% in F1-measure.
  • Minority Language Information Processing
    GAO Dingguo , Tashigyal, ZHAO Dongcai
    2014, 28(1): 113-117.
    Abstract ( ) PDF ( ) Knowledge map Save
    Research on the Tibetan function word is essential to the research on words, sentences and semantics in the Tibetan information processing. The automatic idetification of Tibetan function word paves the way for further research on the Tibetan function word. This paper discusses the role and use of Tibetan function words, reveals the difficulties in automatic identification of Tibetan function word, and finaly proposes a method for the automatic identification of Tibetan function word. The experiment on 2525 sentences achieves an accuracy of 97.0768% for this method.
  • Minority Language Information Processing
    DENG Jun,WUSHOUER Silamu,ANIWANR Tohti,YUAN Tinglei, ZHAO Zhicheng
    2014, 28(1): 118-124.
    Abstract ( ) PDF ( ) Knowledge map Save
    To customize the browser functionality by the second amendment to WebKit core is a popular solution to current development of embedded applications. Focusing on the WebKit engine on Android, this paper makes a comprehensive analysis of the problems of several browsers during Uyghur webpage access and reveals the causes for the abnormal display the Uyghur webpage. According to the characteristics of Uyghur this paper further designs the architecture of uyghur browser, and implements a Uyghur browser on Android platform through a technique of redering the Uygher in the application layer.