汉语复句关系识别是对复句语义关系的识别,复句关系类别的自动识别对促进语言学和中文信息处理的研究有重要的价值。因果类复句是使用频率最高的复句,文中以二句式有标广义因果复句为研究对象, 使用语言技术平台LTP 进行依存句法分析, 获得词性、依存父节点的词序、与父节点的依存关系等特征,将特征的不同组合与预训练的词向量拼接,得到新的向量,将新的向量输入到 DPCNN 模型中来进行关系类别识别。通过实验对提出的方法进行检验,实验结果显示: 与未融合任何特征相比,DPCNN模型中融合语句特征使实验结果的指标均有提升,表明融合语句特征能取得更好的识别效果。在各种特征组合中,融合POS特征组合得到的准确度和F1值最高, 分别为98.41%, 98.28%。
The classification of relation categories of Chinese complex sentences is to identify the semantic relation between clauses. The automatic classification of complex sentence relation category has important research values in linguistic studies and Chinese information processing. This paper explores the relation classification of marked generalized causal complex sentences with two clauses, which are the most frequently used complex sentences in Chinese articles. LTP(Language Technology Platform) is used to analyze dependency syntax to obtain features such as part of speech, word order of dependency parent node and dependency relationship with parent node. Different combinations of features are embedded with pre-trained word vector to obtain new vectors. The new vector is input into DPCNN model to classify the relation of causal complex sentences. Experimental results show that compared with the model without additional features, the fusion of sentence features makes the DPCNN model more effective. In various feature combinations, POS feature fusion has the highest accuracy and F1 value, which are 98.41% and 98.28% respectively.
因果类复句 /
关系识别 /
词向量 /
{{custom_keyword}} /
Key words
causal complex sentence /
relation classification /
word vector /
DPCNN model /
dependency syntax
{{custom_keyword}} /
