汉语复合名词短语因其使用范围广泛、结构独特、内部语义复杂的特点,一直是语言学分析和中文信息处理领域的重要研究对象。国内关于复合名词短语的语言资源极其匮乏,且现有知识库只研究名名复合形式的短语,包含动词的复合名词短语的知识库构建仍处于空白阶段,同时现有的复合名词短语知识库大部分脱离了语境,没有句子级别的信息。针对这一现状,该文从多个领域搜集语料,建立了一套新的语义关系体系,标注构建了一个具有相当规模的带有句子信息的基本复合名词语义关系知识库。该库的标注重点是标注句子中基本复合名词短语的边界以及短语内部成分之间的语义关系,总共收录27 007条句子。该文对标注后的知识库做了详细的计量统计分析。最后基于标注得到的知识库,使用基线模型对基本复合名词短语进行了自动定界和语义分类实验,并对实验结果和未来可能的改进方向做了总结分析。
Abstract
Chinese compound noun phrases are characterized by their wide range of use, unique syntactic structure and complex internal semantics, which has always been an important research object in the field of linguistic analysis and Chinese information processing. We extend the existing study on noun-only Chinese compound noun phrases into compound noun phrases with verbs, and construct a corpus of Chinese compound noun with semantic relations. A total of 27007 sentences are collected from various fields, and boundary of compound noun phrases in the sentences and its internal semantic relationships are annotated. This corpus is characterized by the context information is first provide for Chinese compound nouns and a new semantic relation system is formed to depict Chinese compound nouns. In addition to a detailed analysis of the corpus, the automatic identification of the Chinese compound nouns with the relationships is investigated by Bert+BiLSTM+CRF framework. The experimental results reveal the challenges of this task and the possible solutions are discussed.
关键词
汉语基本复合名词短语 /
语义关系体系 /
定界识别
{{custom_keyword}} /
Key words
Chinese basic compound noun phrases /
semantic relational system /
delimitation recognition
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Leonard Rosemary. The interpretation of English noun sequences on the computer[M]. North-Holland, 1984: 429.
[2] Nakov P, Hearst M. Search engine statistics beyond the n-gram: Application to noun compound bracketing[C]//Proceedings of the Conference on Computational Natural Language Learning, 2005: 17-24.
[3] Lauer M. Designing statistical language learners: Experiments on noun compounds[D]. Ph. D. Thesis, Macquarie University,1995.
[4] Kim S N, Baldwin T. Automatic interpretation of noun compounds using WordNet similarity[C]//Proceedings of the International Joint Conference on Natural Language Processing, 2005: 945-956.
[5] Lapata M. The disambiguation of nominal isations[J]. Computational Linguistics, 2002, 28(3): 357-388.
[6] Moldovan D, Badulescu A, Tatu M, et al. Models for the semantic classification of noun phrases[J]//Proceedings of the HLT-NAACL 2004: Workshop on Computational Lexical Semantics, 2004: 60-67.
[7] Vanderwende L. Algorithm for automatic interpretation of noun sequences[C]//Proceedings of the COLING, 1994: 782-788.
[8] Barker K,Szpakowicz S. Semi-automatic recognition of noun modifier relationships[C]//Proceedings of the International Conference on Computational Linguistics, 1998: 96-102.
[9] Rosario B, Marti H. Classifying the semantic relations in noun compounds via a domain specific lexical hierarchy[C]//Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing, 2001: 82-90.
[10] 刘鹏远,刘玉洁. 中文基本复合名词短语语义关系体系及知识库构建[J]. 中文信息学报, 2019,33(4):20-28.
[11] Downing P. On the creation and use of English compound nouns [J]. Language, 1977: 810-842.
[12] Levi J N. On the alleged idiosyncrasy of non-predicate NP's[C]//Proceedings of the Chicago Linguistic Society, 1974: 10.402-415.
[13] Levi J N. The syntax and semantics of complex nominals[M]. Academic Press, 1978.
[14] Warren B. Semantic patterns of noun-noun compounds[M]. Gothenburg: Gothenburg University Press, 1978.
[15] Diarmuid Séaghdha. SemEval-2010 Task 9: The interpretation of noun compounds using paraphrasing verbs and prepositions[C]//Proceedings of the Workshop on Semantic Evaluations: Recent Achievements & Future Directions. Association for Computational Linguistics, 2010.
[16] Tratz S, Hovy E H. A taxonomy, dataset, and classifier for automatic noun compound interpretation[C]//Meeting of the Association for Computational Linguistics, July, Uppsala, Sweden. DBLP, 2010: 678-687.
[17] 马洪海. “名+名”组合的语义考察[J]. 信阳师范学院学报(哲学社会科学版), 1999(1): 117-120.
[18] 魏雪,袁毓林.基于规则的汉语名名组合的自动释义研究[J].中文信息学报,2014, 28(3): 1-10.
[19] Zhao Jinglei, Hui Liu, Ruzhan Lu. Semantic labeling of compound nominalization in Chinese[C]//Proceedings of the Workshop on A Broader Perspective on Multiword Expressions, Prague, 2007: 73-80.
[20] 魏雪,袁毓林. 基于语义类和物性角色建构名名组合的释义模板[J].世界汉语教学, 2013(2): 172-181.
[21] 赵军,黄昌宁.结合句法组成模板识别汉语基本名词短语的概率模型[ J].计算机研究与发展,1999,(11).
[22] 孟迎,冯丽辉,等.基于决策树的汉语基本名词短语识别[J].黑龙江工程学院学报(自然科学版),2004,(6).
[23] 祝慧佳. 汉语名词复合短语识别与分类的方法研究[D]. 哈尔滨: 哈尔滨工业大学硕士学位论文,2007.
[24] 孙玉祥.汉语简单名词短语自动识别的研究[D]. 大连: 大连理工大学硕士学位论文,2014.
[25] 陆俭明. 汉语和汉语研究十五讲[M]. 北京: 北京大学出版社,2004.
[26] 邵敬敏. 双音节结构的配价分析[M]//现代汉语配价语法研究. 北京: 北京大学出版社,1995.
[27] 尹世超.动词直接做定语与动词的类[A]. 第十一次现代汉语语法学术讨论会.2000.
[28] Yang Jie, Shuailong Liang, Yue Zhang. Design challenges and misconceptions in neural sequence labeling[C]//Proceedings of the 27th International Conference on Computational Linguistics, 2018.
[29] Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for languageunderstanding[J]. arXiv preprint arXiv: 1810.04805, 2018.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61872402);教育部人文社科规划基金(17YJAZH068);北京语言大学校级项目(中央高校基本科研业务费专项资金)(18ZDJ03)
{{custom_fund}}