Journal of Chinese Information Processing

Select

Information Retrieval and Social Computing

A Language Model Based on Category Prior for Question Retrieval

JI Zongcheng, WANG Bin

. 2014, 28(4): 98-103.

Abstract (828) PDF (790)

Knowledge map

Save

Community Question Answering (CQA) services have been building up large archives of question-answer pairs, which are organized into a hierarchy of categories. To reuse the invaluable historical question-answer pairs, it is essential to develop effective Question Retrieval (QR) models. In this paper, we propose a novel approach based on category prior of questions within the language modeling framework for improving the QR performance. Specifically, a new Language Model based on category prior is proposed which views the Leaf Category Language Model as the Dirichlet hyper-parameter that weights the parameters of the unigram Language Model. The approach has solid mathematic foundation. Experiments conducted on a large scale real world CQA dataset from Yahoo! Answers show that our proposed method can significantly outperform the previous work which just combines the category information with the unigram Language Model linearly.

Select

Information Retrieval and Social Computing

Reducing Long Queries Based on Words Association

CHEN Weipeng, FU Ruiji, HU Yi, QIN Bing, LIU Ting

. 2014, 28(4): 104-110.

Abstract (605) PDF (794)

Knowledge map

Save

Long queries refer to complex queries submitted by users. Current search engines good at keywords matching will return limited results if all words in the long queries are matched as keywords, often only very limited results are returned. In this paper, we attempt to improve the retrieval results by using the association between the words to delete the words which offer little information. In our experiments, two aspects of evaluation,“machine-oriented” and “user-oriented” are used. In the “machine-oriented” evaluation, the highlight ratio and the result number of related documents is considered. In the “user-oriented” evaluation, the retrieval results are evaluated by a human judger. The experimental results show that our method can significantly improve the quantity and quality of search results.

Select

Information Retrieval and Social Computing

Detecting Anaphoricity for Coreference Resolution in Interactive Question Answering

ZHANG Chao, KONG Fang, ZHOU Guodong

. 2014, 28(4): 111-116.

Abstract (629) PDF (828)

Knowledge map

Save

Baidu(1)

Interactive Question Answering (IQA), a hot research topic in the area of QA, can interact with users to process a series of questions from users just like talking to them. This paper systematically explores anaphoricity determination for coreference resolution in IQA. The statistic of the corpus shows the distribution of anaphoricity and the experiment in the TREC QA questions set which uses the rule-based and flat feature-based method shows the performance of anaphoricity determination for coreference resolution in IQA. On the basis of the characteristic of IQA, two flat features about proper noun are proposed. Experimental results show that the proper method and the proposed feature is effective.

Select

Information Retrieval and Social Computing

Music Recommendation Study Based on Tags Multi-Space

YAN Jun, LIU Wenfei, LIN Hongfei

. 2014, 28(4): 117-122.

Abstract (562) PDF (962)

Knowledge map

Save

Baidu(5)

Currently, music recommendation is a hot topic for lots of music sites, radio and other music medias. To address this issue, we take the social tags as the main resources of recommending methods, and map the tags into three semantic spaces which include genre, emotion and context. Then we calculate the similarity of users and tracks in each space. At last, we merge the similarities in three spaces with different methods to recommend the right tracks to users. The experiments show that the recommending method of merging different spaces similarities gets a good result.

Select

Information Retrieval and Social Computing

Multi-feature Based Sentiment Orientation Identification Algorithm forMicro-blog Topics

LIU Quanchao, HUANG Heyan, FENG Chong

. 2014, 28(4): 123-131.

Abstract (846) PDF (921)

Knowledge map

Save

Baidu(4)

Public opinion analysis for micro-blog post is a new trend, wherein sentiment orientation identification on micro-blog topic is a hot issue. According to the features of contents and the various relations of Chinese micro-blog post, we construct the dictionaries of sentiment words, internet slang and emotions respectively, Then we implement the sentiment analysis algorithms based on phrase path and the multi-feature of sentiment orientation of micro-blog topics. Using micro-blogs forwarding and commentaries, we take a future step to optimize the algorithm based on the multiple features. According to the experimental results, the values of the Precision and F-measure reach 85.3% and 79.4%, respectively.

Select

Information Retrieval and Social Computing

Research on Microblog Information Diffusion Network Structural Properties

WANG Xiaoming, WANG Li, YANG Jingzong

. 2014, 28(3): 55-61.

Abstract (947) PDF (908)

Knowledge map

Save

Microblog is widely used nowadays. While its users interaction structure is complex, a novel method is proposed in this paper to analyze the property of microblog information diffusion network. We first give the definition of the information source. Then information diffusion networks for six different topic events are visualized and analyzed. Information diffusion network is modeled as a directed acyclic graph, and three motif structures are defined to present information scattering, information gathering and information transmitting, respectively. According to the Spearman rank correlation coefficient, the distributions of the three motif structures are quite different from each other. As for the information diffusion network evolution, it is dount that the information scattering structure has the largest number at each snapshot.

Select

Information Retrieval and Social Computing

Research on Detecting Spammer in Micro-blogs

LI Heyuan 1,2, YU Xiaoming 1, LIU Yue 1, CHENG Xueqi 1, CHENG Gong3

. 2014, 28(3): 62-67.

Abstract (1298) PDF (1100)

Knowledge map

Save

Micro-blogs changes the way people obtain information. However, Micro-blogs has been infiltrated by large amount of spam, which is a challenge to normal user. In this paper, we research on spam in Chinese Micro-blogs. We study the behavior of spam user and propose 7 new features for detecting them. Then, we describe how to apply features into detecting spammer via a SVM classifier. The experiment results indicate that the accuracy and recall of the proposed method is satisfactory.

Select

Information Retrieval and Social Computing

Tweet Popularity Prediction Based on Propagation Simulation

WAN Shengxian1,2, GUO Jiafeng 1, LAN Yanyan 1, CHENG Xueqi1

. 2014, 28(3): 68-74.

Abstract (779) PDF (1056)

Knowledge map

Save

Tweet popularity prediction in social network is very important for applications such as information recommendation and viral marketing. This paper proposes a new approach for tweet popularity prediction based on propagation simulation. The maximum entropy model is firstly used to learn the probabilities of users retweeting behaviors, and then the independent cascade model is used to simulate the diffusion processes of tweets in real social network. This approach benefits from using more information of social network structure and users. Experiments on Twitter dataset show that our approach is better in both precision and stability compared to baselines.

Select

Information Retrieval and Social Computing

Research on Long-tail Query Search Performance Evaluation

HUO Shuai, ZHANG Min, LIU Yiqun, MA Shaoping, JIN Yijiang, RU Liyun

. 2014, 28(3): 75-80.

Abstract (671) PDF (873)

Knowledge map

Save

Search engines are committed to helping people find target information accurately and quickly, hence the evaluation of search performance becomes more vital, This paper deals with the rare queries performance evaluation which is less touched. First, three types of features are extracted after analyses of rare queries characteristics. Second, correlation of the features is analyzed and different combinations of features are tested. Then, two data balancing approaches are raised to alleviate the serious imbalance of the data set. Finally the evaluation method for rare queries is put forward and then improved. The experimental results show that the proposed evaluation approach is effective, by which the identification of non-relevant results achieves encouraging precision.

Select

Information Retrieval and Social Computing

CICF: A Context Information Based Collaborative Filtering Algorithm

. 2014, 28(2): 122-128.

Abstract (1039) PDF (907)

Knowledge map

Save

Collaborative Filtering (CF) could satisfy users preferences and provide personalized guidance. As the key techniques in current Internet recommendation engines, however, this technology suffers from severe sparse users ratings problem. Considering the plenty context information in users rating histories, this paper utilizes two kinds of context information to address sparsity issue: the effect of hierarchical structure on users potential preferences and the dynamic effect of users short term ratings. An integrated model CICF is then proposed based on the two of the features mentioned above. Experimental results on Yahoo! Music ratings show that CICF could significantly improve the predication performance compared to baseline method. Furthermore, it is also demonstrated that CICF could effectively mitigate rating sparsity issue.

Select

Information Retrieval and Social Computing

LDA-CF: A Mixture Model for Collaborative Filtering

LIAN Tao, MA Jun, WANG Shuaiqiang, CUI Chaoran

. 2014, 28(2): 129-135.

Abstract (1311) PDF (1109)

Knowledge map

Save

Baidu(6)

Recommender system is an important tool to overcome information overload, where the most popular approach is collaborative filtering. This paper presents a mixture model for collaborative filtering named LDA-CF, which combines latent factor models and neighborhood methods. Firstly we convert the ratings matrix into a collection of pseudo-documents and utilize the LDA topic model to identify user and item latent factor vectors. Then we compute user-item similarities in the low-dimensional latent factor space. Finally we employ the neighborhood methods to predict unobserved ratings. Experiments on MovieLens 100k dataset demonstrate that LDA-CF outperformed neighborhood methods on the task of rating prediction in terms of MAE.

Select

Information Retrieval and Social Computing

Microblog Retrieval via Author Based Microblog Expansion

LI Rui, WANG Bin

. 2014, 28(2): 136-143.

Abstract (569) PDF (1000)

Knowledge map

Save

Baidu(3)

In recent years, the development of the microblogging is impressive.The microblogging retrieval has become an important research topic.Microblog texts are short, quick updated, and circulated overthe social network, which makes themicroblogging search different from the traditional web search. In this paper, we first comparethe traditional vector space model, probabilistic model and the basic language model in microblog search.Thenwe proposeto expand the microblog textvia the author informationto improve the retrieval. Asfor the issue caused by theshort document occurred in the topic model training, we usethe author’s topic model to further extend the content of microblogging. Tested on the twitter data set,the results show thatthe proposed author modelcan improve the retrieval effects in microblogging search task.

Select

Information Retrieval and Social Computing

Search Behavior Study Based on the Mobile SearchLog

WAN Fei, ZHAO Xi, LIANG Xun, PAN Deng, NI Zhihao

. 2014, 28(2): 144-150.

Abstract (650) PDF (965)

Knowledge map

Save

Baidu(5)

With the rapid development of mobile web, the mobile search engine users has been growing sharply. It’s of great significance to analysis users’ behavior toimprovethe mobile search engine and users’satisfaction. This paper selects the log of a mobile search engine in the first week of June 2011, and analyzes the mobile search engine user behavior. From perspectives of query word,session and clicks,we examinelength and frequency of query words,ratio of question query and URL query, number of queries in a session, modification of query words and click distribution. Furthermore, we compareresults of mobile search engine with internet search engine. These findingsare of substantial significance for the improvement and optimization of the mobile search engine.

Select

Information Retrieval and Social Computing

A Survey of Conversion-based Internet Advertising Model

GU Zhiyu , QIN Tao, WANG Bin

. 2014, 28(2): 151-158.

Abstract (763) PDF (997)

Knowledge map

Save

The conversion-based advertising,which evaluateseffectiveness of an advertisement and chargesaccording to conversion occurred after a user viewed theadvertisement, leverages the unique power of Internet Advertising, and becomes the trend for future development of Internet Advertising. This paper introduces the scheme of the conversion-based advertising, analyzes its industrial application, and summarizes the researches on this field, including auction mechanism for CPA advertising, conversion rate estimation, conversion-based ad ranking, etc. Finally we analyze the existing problem and present the directions for further study.

Please choose a citation manager

Content to export