基于数量有限的文档,该文构建以基本要素中的head和modifier为节点的无向网络UBEN,调查了话题相关文档的UBEN的连通性,指出了话题相关的文档的UBEN具有的特性。讨论停用词对UBEN连通性的影响,比较了相关文档集和随机文档集的UBEN的联通特性的差异,指出了连通性在一定程度上是文档之间内容相关导致的融合结果。结论对多文档自动文摘和信息检索等任务有一定的意义。
Abstract
Based on relatively limited number of documents, undirected basic element networks (UBEN), in which nodes are header or modifier, are constructed. The connectivity of UBEN constructed on topic-related documents is investigated and the stopwords influence on connectivity is discussed. Furthermore, the connectivity difference between UBENs respectively constructed on topic-related documents and randomly-selected documents are contrasted. It is pointed out that connectivity of UBEN construced on topic-related documents are resulted from information fusion of the topic-related documents on some level, instead of from property of language only. This conclusion is of some significance for some natural language processing tasks, such as automatic summarization, information retrieval, etc.
关键词
话题相关文档集 /
自动文摘 /
复杂网络 /
连通性 /
信息融合
{{custom_keyword}} /
Key words
topic-related document set /
complex network, automated summarization, information fusion, information retrieval
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1]Ferrer I Cancho R, Sole R V. The small world of human language[J]. Royal Society B: Biological Sciences. 2001, 268(1482): 2261-2265.
[2] Ferrer I Cancho R. The structure of syntactic dependency networks: Insights from recent advances in network theory[J]. The Problems of Quantitative Linguistics, Ruta, Chernivtsi. 2005: 60-75.
[3] Sole R V, Murtra B C, Valverde S, et al. Language Networks: their structure, function and evolution[J]. Trends in Cognitive Sciences. 2006.
[4] Mehler A. Large Text Networks as an Object of Corpus Linguistic Studies[C]//Proceedings of the Corpus Linguistics. An International Handbook of the Science of Language and Society. De Gruyter, Berlin/New York. 2007.
[5] Ferrer I Cancho R, Sol R V, K R. Patterns in syntactic dependency networks[J]. Physical Review E Phys Rev E. 2004, 69: 051915.
[6] Motter A E, De M A, Lai Y C, et al. Topology of the conceptual network of language[J]. Science Phys Rev E. 2002, 65: 065102.
[7] Newman M E. The structure and function of complex networks[C]//Proceedings of the Arxiv preprint cond-mat/0303516. 2003.
[8] Steyvers M, Tenenbaum J B. The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth[J]. Cognitive Science. 2001, 29(1): 41-78.
[9] Sigman M, Cecchi G A. Global organization of the Wordnet lexicon[J]. Proceedings of the National Academy of Sciences. 2002, 99(3): 1742-1747.
[10] Steyvers M, Tenenbaum J B. The large-scale structure of semantic networks: statistical analyses and a model for semantic growth[C]//Proceedings of the Arxiv preprint cond-mat/0110012. 2001.
[11] Antiqueira L, Nunes M G, Oliveira J O, et al. Strong correlations between text quality and complex networks features[J]. Physica A: Statistical Mechanics and its Applications. 2007, 373: 811-820.
[12] Yang H, He Y, Ji D, et al. Evaluating Multi-Document's Content: Using Basic Element Complex Networks[J]. Journal of Computational Information Systems. 2008, 4(3): 907-914.
[13] Nenkova A. Understanding the process of multi-document summarization: content selection, rewriting and evaluation[J]. 2006.
[14] Hovy E, Lin C Y, Zhou L. Evaluating DUC 2005 using basic elements[C]//Proceedings of the Fifth Document Understanding Conference (DUC). 2005.
[15] Paul O, James Y. An Introduction to DUC-2004[C]//Proceedings of the 4th Document Understanding Conference (DUC 2004). 2004.
[16] Dang H T. Overview of DUC 2005[C]//Proceedings of the DUC 2005 Workshop at HLT/EMNLP. 2005.
[17] Hovy E, Lin C Y, Zhou L, et al. Basic Elements[C]//Proceedings of the Available from hayden. isi. edu/BE. 2005.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金项目 (61070243);国家社科基金重大项目(11&ZD189);贵州省高层次人才科研项目(TZJF-2010年048号);贵州省科教青年英才培养工程项目(“黔省专合字(2012)155号”);贵州师范大学博士科研启动基金项目(11904-05032110011);中国博士后科学基金项目(2013M531730)
{{custom_fund}}