基于群体智慧的语料标注方法研究

柯永红,俞士汶,穗志方,宋继华

PDF(3270 KB)
PDF(3270 KB)
中文信息学报 ›› 2017, Vol. 31 ›› Issue (4) : 108-113.
语言资源建设

基于群体智慧的语料标注方法研究

  • 柯永红1,俞士汶1,穗志方1,宋继华2
作者信息 +

Research on Corpus Annotation Method Based on Collective Intelligence

  • KE Yonghong1, YU Shiwen1, SUI Zhifang1, SONG Jihua2
Author information +
History +

摘要

自然语言处理系统的性能和鲁棒性在很大程度上取决于建模过程中是否有足够的深度标注语料。传统的人工标注方法难以满足大规模、高质量的深度语料标注需求,该文提出了基于群体智慧的语料标注方法,设计了标注模型,并就用户能力评测、语料筛选、任务管理、协作标注、行为分析、质量控制、决策加总、考核激励等具体环节进行分析,提出了解决方案。项目实践表明: 基于群体智慧的语料标注方法在应对创新性很强的自然语言处理研究项目时具有明显的优势。

Abstract

The performance and robustness of the natural language processing system depend strongly on annotated corpus.To meet the requirement of large scale and high quality corpus annotation, this paper describes an annotation method based on collective intelligence, including the system structure, user capacity evaluation, data selection, task management, collaborative tagging, behavior analysis, quality control, judgement and optimaztion. Project practice shows the annotation method based on collective intelligence has significant advantages for natural language processing research projects.

关键词

群体智慧 / 语料标注 / 自然语言处理

Key words

collective intelligence / corpus annotation / natural language processing

引用本文

导出引用
柯永红,俞士汶,穗志方,宋继华. 基于群体智慧的语料标注方法研究. 中文信息学报. 2017, 31(4): 108-113
KE Yonghong, YU Shiwen, SUI Zhifang, SONG Jihua. Research on Corpus Annotation Method Based on Collective Intelligence. Journal of Chinese Information Processing. 2017, 31(4): 108-113

参考文献

[1] Mathes A.Folksonomies-cooperative classification and communication through shared metadata [OL]. http://www.adammathes.com/academic/computer-mediated-communication/folksonomies.html,2004.
[2] Lambiotte R,Ausloos M. Collaborative taggingas a tripartite network[C]//Proceedings of the International Conference on Computational Science. Springer-Verlag, 2006: 1114-1117.
[3] Cattuto C, Schmitz C., Baldassarri A, et al. Network properties of folksonomies [J]. AI Communications Journal, Special Issue on Network Analysis in Natural Sciences and Engineering, 2007, 20(4): 245-262.
[4] Cattuto C, Loreto V, Pietronero L. Semiotic dynamics and collaborative tagging [C]//Proceedings of the National Academy of Sciences, 2007(104): 1461-1464.
[5] Sheng hua B, Xian W. Optimizing web search using social annotations[C]//Proceedings of the 16th World Wide Web 2007. New York: ACM, 2007: 501-510.
[6] 靳延安,李瑞轩,文坤梅,等. 社会标注及其在信息检索中的应用研究综述[J]. 中文信息学报,2010,42(4): 52-62.
[7] 李志云.协同标注研究及其在数字博物馆中的应用[J]. 计算机工程,2008,34(6): 221-223.
[8] 李宏言,范利春,高鹏,等. 大数据语音语料库的社会标注技术[J]. 清华大学学报(自然科学版),2014,53(6): 909-912.
[9] 百度百科. 群体智慧. [EB/OL]. http: / /baike.baidu.com/view /911476.Htm.2010-01-10.
[10] 维基百科. 群体智慧. [EB/OL]. http: / /zh.wikipedia.org/zh-cn/.Html,2010-01-10.
[11] Singh V K, Jalan R, Chaturvedi S K, et al A.K. Collective intelligence based computational approach to web intelligence [C]//Proceedings of 2009 International Conference on Web Information Systems and Mining, Shanghai, China, Nov 7-8, 2009.
[12] Lykourentzou I, Papadaki K, Vergados D J, et al. A self-regulating wiki to promote corporate collective intelligence through expert peermatching [J]. Information Science, 2010(180): 18-38.
[13] Chen X, Li W, Luo J, et al. Open geometry textbook: a case study of knowledge acquisition via collective intelligence [M]. Intelligent Computer Mathematics. Berlin Heidelberg.Springer, 2012: 432-437.
[14] Howe J. The rise of crowdsourcing[J]. Wired, 2006, 14(6): 176-183.
[15] Alag S. Collective intelligence in action [M]. New York: Manning, 2009.
[16] Doan A, Ramakrishnan R, Halevy A Y. Crowdsourcing systems on the world-wide web [J]. Communications of the ACM, 2011, 54(4): 86-96.
[17] Di Maio P. Making sense of collective intelligence [J]. Feedback, 2013(4): 6-22.
[18] 苏寒,胡笑旋. 基于群体智慧的复杂问题决策模式[J]. 中国管理科学,2012(20): 783-789.
[19] E. Decisions 2.0: the power of collective intelligence [J]. MIT Sloan Management Review Winner, 2009, 50(2): 45-52.

基金

中国博士后科学基金(2015M570877);国家重点基础研究发展计划(2014CB340504)
PDF(3270 KB)

619

Accesses

0

Citation

Detail

段落导航
相关文章

/