应用自然语言处理技术和复杂网络技术,可以对中文文学作品中内含的社会网络进行抽取和分析。该文以《三国演义》为例,抽取了其中的社会网络,节点是作品中的人物,边是人物之间的联系,边的权重为各章回中的人物共现次数。借助背景知识和互联网构建了角色库辅助网络建模。对构建出来的社会网络进行分析,包括节点度分布、中心性、聚类特征等。结果表明,中文文学作品中的角色分布具有明显的小世界性、有限幂律分布特征和社区特性,同时也有多面性和多元性。
Abstract
Through the technology of natural language processing and complex network analysis, the social networks in Chinese literature are extracted and analyzed. From the “Romance of the Three Kingdoms”, as an example, this paper extracts the social networks, with nodes as novel characters, edges as the connections between the characters, and weight of the edges as the co-occurrence times the characters. The social networks are then analyzed for the node degree distribution, centrality, clustering characteristics, etc. The results show that the characters in Chinese literature have obvious small-world and limited power-law distribution. Again in “Romance of the Three Kingdoms”, characters distribution have clear community characteristics, as well as versatility and diversity.
关键词
文学作品 /
社会网络 /
自然语言处理
{{custom_keyword}} /
Key words
literary /
social networks /
natural language processing
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Hogan, Patrick Colm. Conversations on Cognitive Cultural Studies: Literature, Language, and Aesthetics[M].The Ohio State University Press, 2014.
[2] Adam Hammond, Julian Brooke, GraemeHirst. A Tale of Two Cultures: Bringing Literary Analysis and Computational Linguistics Together[C]//Proceedings of the Second Workshop on Computational Linguistics for Literature. Atlanta, Georgia, June 14, 2013: 1-8.
[3] 顾静航,钱龙华等.基于信息抽取的人物关系网络构建研究[D].苏州大学硕士学位论文,2014.
[4] 熊丹,陆勤等.基于语料库的明清小说人名与称谓研究[J].中文信息学报,2015,29(01): 19-27.
[5] A Hassan, A Abu-Jbara, and D Radev. Extracting signed social networks from text[C]//Proceeding of the Text Graphs Workshop at ACL,2012: 4-12.
[6] R Krestel, L Chen. Using co-occurrence of tags and resources to identify spammers[C]//Proceedings of ECML PKDD Discovery Challenge,2008: 38-46.
[7] David K Elson.Modeling Narrative Discourse[D]. Columbia University,2012.
[8] Apoorv Agarwal,Augusto Corvalan, Jacob Jensen, and Owen Rambow. Social network analysis of Alice in wonderful[C]//Proceedings of the NAACLHLT 2012 Workshop on Computational Linguistics for Literature, 2012: 88-96.
[9] David KElson,Nicholas Dames,Kathleen R. McKeown. Extracting Social Networks from Literary Fiction[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 2010: 138-147.
[10] Franco Moretti. Distant Reading[M]. The Verso Press. 2013: 211-223.
[11] Y Fujii, T Yoshimura, and T Ito. Filtering harmful sentences based on three-word co-occurrence[C]//Proceeding of 8th Annual Collaboration Electronic messaging Anti-Abuse and Spam Conference, 2011: 64-72.
[12] Inderjeet Mani. Plots as Summaries of Event Chains. Invited Talk. Workshop on Computational Models of Narrative. 2013.
[13] Julian Brooke, GraemeHirst, and Adam Hammond. Clustering voices in the Waste Land[C]//Proceedings of the 2nd Workshop on Computational Literature for Literature. 2013.
[14] Harrison Rainie, Lee Rainie, Barry Wellman. Networked: The New Social Operating System[M]. MIT Press, 2012: 41-67.
[15] Franco Moretti. Graphs, Maps, Trees: Abstract Models for a Literary History[M]. The Verso press, London. 2005.
[16] Irene-AnnaDiakidoy, Antonis Kakas. Narrative Text Comprehension: From Psychology to AI[C]//Proceedings of the 11th International Symposium on Logical Formalizations of Commonsense Reasoning. Ayia Napa, Cyprus, May, 2013: 27-29.
[17] Katarzyna Musial,Marcin Budka,Krzysztof Juszczyszyn. Creation and growth of online social network[J]. Journal World Wide Web, 2013,16(4): 421-447.
[18] Elson DK, DamesN, McKeownKR. Extracting social networks from literary fiction.[EN/OL]. [2012-02-19]. http://www.cs.columbia.edu/~delson/pubs/acl2010-ElsonDamesMcKeown.pdf.
[19] Apoorv Agarwal and Owen Rambow. Automatic detection and classification of social events[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 2010: 1024-1034.
[20] Shimon Even. Graph Algorithms[M]. Cambridge University Press; 2nd Revised edition.2011,9.
[21] 宋洋,王厚峰.共指消解研究方法综述.中文信息学报[J].中文信息学报,2015,29(1): 1-12.
[22] 高俊伟,朱巧明等.中文指代消解关键问题研究[D].苏州:苏州大学硕士学位论文,2012.
[23] 钱小飞,侯敏.基于归约的汉语最长名词短语识别方法[J].中文信息学报,2015,29(2): 40-48.[24] http://ictclas.nlpir.org/[EN/OL]. [2015-10-12].
[25] 周炫余,刘娟,等.中文指代消解模型的对比研究[J].计算机科学,2016,43(2): 31-34, 56.
[26] Cheng Wang, Qing Zhang, Jianping Gan. Study on Efficient Complex Network Model[C]//Proceedings of the 2nd International Conference on Green Communications and Networks: Volume 5. Lecture Notes in Electrical Engineering: Volume 227, 2013: 159-164.
[27] Franco Moretti. Network theory, plot analysis[J]. New Left Review, 2014,28(1): 80-102.
[28] 范超,王厚峰.社交网络中的社团结构挖掘[J].中文信息学报,2014,28(1): 56-63.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61272260,61273320)
{{custom_fund}}