Language Analysis and Generation
XIAO Yonglei , LIU Shenghua , LIU Yue ,
CHENG Xueqi , ZHAO Wenjing , REN Yan , WANG Yuping
.
2014, 28(4):
21-28.
The emergence of social media services is seeing a large amount of short text such as tweets and reviews are generated every day. Mining those data attracts more interests from both industry and academia. And such data has already become an important source of information for marketing, stock prediction, etc. However, mining short text is non-trival since of extremely sparse text and lack of context. Thus we propose to enrich short text content by automatically identifying concepts in open knowledge bases such as Wikipedia, which are semantically related to them. In our work, firstly, through linkable pruning, concept linking and disambiguation, important n-grams in tweet and their related Wikipedia concepts are linked. Secondly, NMF (non-negative matrix factorization) is used to factorize concept-document matrix to get concepts' semantic neighbors. And related concepts are then expended for tweets. Experiments on the collection of tweets from TREC 2011 and Wikipedia 2011 show that our approach gets effective results.