博客倾向性检索的目标是检索出不仅与特定查询主题相关而且包含针对该主题的评论的博文单元,并依据倾向性强度进行排序。目前大多数研究工作仅仅通过单个博文单元包含的主题倾向性强弱对博文进行排序。然而,博客是博主表达自己观点情感的媒介,博主的个性风格很大程度上影响着倾向性强度,忽略博主因素仅仅使用单个博文单元获取倾向性评分,会给倾向性评分带来偏差。针对这个问题,该文首先分析博主背景因素对倾向性评分的影响并建立博主背景模型,然后提出基于博主背景的博客倾向性检索归一化策略,最后使用该策略对基于概率推理模型的博客倾向性检索算法进行归一化。实验结果表明,基于博主背景的倾向性检索归一化策略能够更加合理地对博主单元进行排序。
Abstract
The goal of Blog Opinion Retrieval is to retrieve the blog units that not only relate to a given query but also comment on the query. Previous works ranked blog units by the opinion strength of a single blog unit. However, since blog is the media expressing the blogger’s opinions and feelings, the personality of a blogger could affect the strength of his opinion. Therefore, it is disadvantageous defect to use only a single blog unit to get opinion score while neglecting the blogger’s factor. In this paper we build a blogger profile and then present a blogger-profile based normalization strategy for blog opinion retrieval. We apply it to normalize the Blog Opinion Retrieval algorithm based on probabilistic inference model. Experiment results show that the proposed normalization strategy could rank blog units more reasonably and improve the retrieval performance.
Key wordscomputer application; Chinese information processing; blog opinion retrieval;blogger-profile;normalization strategy
关键词
计算机应用 /
中文信息处理 /
博客倾向性检索 /
博主背景模型 /
归一化策略
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
blog opinion retrieval /
blogger-profile /
normalization strategy
/
/
/
/
/
/
/
/
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
:[1] Arun Qamra, Belle Tseng and Edward Y. Chang. Mining Blog Stories Using CommunityBased and Temporal Clustering[C] // Proc. of CIKM’06. Arlington, Virginia, USA: ACM 2006.
[2] Ounis Iadh, de Rijke Maarten, et al. Overview of the TREC-2006 Blog Track[C/OL] //Proc. of the Fifteenth Text REtrieval Conference (TREC 2006). Gaithersburg, Maryland, USA: NIST 2006. [2007-01-23], http://trec.nist.gov/pubs/trec15/papers/BLOG06.OVERVIEW.pdf
[3] Craig Macdonald, Iadh Ounis , Ian.Soboroff Overview of the TREC-2007 Blog Track[C/OL] // Proc. of The Sixteenth Text REtrieval (TREC 2007). Gaithersburg, Maryland, USA: NIST 2007. [2007-12-12], http://trec.nist.gov/pubs/trec16/papers/BLOG.OVERVIEW16.pdf
[4] 杨宇航, 赵铁军, 于浩, 郑德权. Blog研究[J]. 软件学报, 2008, 19(4): 912-924.
[5] Turney P. Thumbs up or Thumbs down? Semantic orientation applied to unsupervised classification of reviews[C] // Proc. of ACL’02. Philadelphia, PA, USA: Association for Computational Linguistics, 2002: 417-424.
[6] Pang B, Lee L and Vaithyanathan S. Thumbs up? Sentiment Classification Using Machine Learning Techniques[C] // Proc. of ACL’02. Philadelphia, PA, USA: Association for Computational Linguistics, 2002: 79-86.
[7] Pang Bo, Lee Lillian. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts[C] // Proc. of ACL’04. Barcelona, Spain: Association for Computational Linguistics, 2004: 1030-1035.
[8] M. Hurst and K. Nigam. Retrieving Topical Sentiments from Online Document Collections [C]// Document Recognition and Retrieval XI, 2004: 27-34.
[9] K. Eguchi, V. Lavrenko. Sentiment Retrieval using Generative Models [C]// Proceedings of Empirical Methods on Natural Language Processing (EMNLP), 2006: 345-354.
[10] Min Zhang, Xingyao Ye. A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval [C]// the Proceedings of SIGIR’08, Singapore, July 20-24, 2008.
[11] 廖祥文, 曹冬林, 方滨兴,许洪波, 程学旗. 基于概率推理模型的博客倾向性检索研究[J]. 计算机研究与发展, 2009, 46(9): 1530-1537.
[12] D. Hannah, C. Macdonald, et al. University of Glasgow at TREC 2007: Experiments in Blog and Enterprise Tracks with Terrier [C]// Proceedings of 15th TREC, 2007.
[13] Kiduk Yang, Ning Yu, Hui Zhang. WIDIT in TREC-2007 Blog Track: Combining Lexicon-based Methods to Detect Opinionated Blogs [C]// Proceedings of TREC’07, 2007.
[14] GuangXu Zhou, Hemant Joshi, Coskun Bayrak. Topic Categorization for Relevancy and Opinion Detection [C]// Proceedings of TREC’07, 2007.
[15] Liao Xiangwen, Cao Donglin, Wang Yu,et al. Experiments in TREC 2007 Blog Opinion Task at CAS-ICT[C/OL] // Proc of The Sixteenth Text REtrieval (TREC 2007). Gaithersburg, Maryland, USA: NIST 2007. [2007-12-12],http://trec.nist.gov/pubs/trec16/papers/cas-ict.blog.final.pdf.
[16] Wei Zhang, Clement Yu, Weiyi Meng. Opinion Retrieval from Blogs [C]// Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, Lisbon, Portugal, 2007:831-840.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
福建省科技创新平台计划项目(2009J1007);福州大学引进人才基金(022224)
{{custom_fund}}