基于用户查询日志的命名实体挖掘

翟海军1,郭嘉丰2,王小磊2,许洪波2

中文信息学报 ›› 2010, Vol. 24 ›› Issue (1) : 71-77.
综述

基于用户查询日志的命名实体挖掘

  • 翟海军1,郭嘉丰2,王小磊2,许洪波2
作者信息 +

Mining Named Entities from Query Logs

  • ZHAI Haijun1, GUO Jiafeng2, WANG Xiaolei2, XU Hongbo2
Author information +
History +

摘要

针对大规模查询日志中丰富的命名实体的挖掘是数据挖掘领域中的重要研究课题。已有的研究工作提出了一种基于种子实体的抽取框架,利用实体间的分布相似度进行挖掘。然而该工作只有当种子实体仅属于单个语义类别时才能取得好的结果,实际上命名实体往往可能从属于多个类别。该文通过引入一个弱指导话题模型,利用少量的人工指导信息,很好地解决了实体的类别模糊性,提高了挖掘的有效性。实验表明该文提出的方法在实体挖掘性能上显著优于已有的方法。

Abstract

Mining named entities from query logs is an important research field in data mining. Previous work proposed a seed-based framework to mine named entities from query logs by leveraging distribution similarity, which works well only when each named entity only belongs to a signle semantic class. In fact, named entities may often belong to multiple classes. In this paper, we introduce a weakly-supervised topic model to resolve class ambiguity of named entities by leveraging weak supervision from human. The experiment results show that our approach significantly outperforms the previous method.
Key wordscomputer application;Chinese information processing;named entity;query log;topic model

关键词

计算机应用 / 中文信息处理 / 分开命名实体 / 用户查询日志 / 话题模型

Key words

computer application / Chinese information processing / named entity / query log / topic model
 
/   /   /
 
/   /   /
 
/   /  

引用本文

导出引用
翟海军1,郭嘉丰2,王小磊2,许洪波2. 基于用户查询日志的命名实体挖掘. 中文信息学报. 2010, 24(1): 71-77
ZHAI Haijun1, GUO Jiafeng2, WANG Xiaolei2, XU Hongbo2. Mining Named Entities from Query Logs. Journal of Chinese Information Processing. 2010, 24(1): 71-77

参考文献

[1] Borthwick Andrew,Sterling J.,Agichtein E,Grishman R.. NYU: Description of the MENE Named Entity System as used in MUC-7[C]//Proc. Seventh Message Understanding Conference. 1998.
[2] Cucchiarelli Alessandro,Velardi P. Unsupervised Named Entity Recognition Using Syntactic and Semantic Contextual Evidence[J]. Computational Linguistics,2001,27(1): 123-131。
[3] Evans Richard. A Framework for Named Entity Recognition in the Open Domain[C]// Proc. Recent Advances in Natural Language Processing. 2003.

574

Accesses

0

Citation

Detail

段落导航
相关文章

/