结合深度学习和语言难度特征的句子可读性计算方法

唐玉玲,张宇飞,于东

PDF(1393 KB)
PDF(1393 KB)
中文信息学报 ›› 2022, Vol. 36 ›› Issue (2) : 29-39.
语言分析与计算

结合深度学习和语言难度特征的句子可读性计算方法

  • 唐玉玲,张宇飞,于东
作者信息 +

Combination of Deep Learning and Language Difficulty Feature for Sentence Readability Metric

  • TANG Yuling, ZHANG Yufei, YU Dong
Author information +
History +

摘要

该文提出了可读性语料库构建的改进方法,基于该方法,构建了规模更大的汉语句子可读性语料库。该语料库在句子绝对难度评估任务上的准确率达到78.69%,相对前人工作提升了15%以上,证明了改进方法的有效性。同时,将深度学习方法应用于汉语可读性评估,探究了不同深度学习方法自动捕获难度特征的能力,并进一步探究了向深度学习特征中融入不同层面的语言难度特征对模型整体性能的影响。实验结果表明,不同深度学习模型捕获难度特征的能力不尽相同,语言难度特征可以不同程度地提高可读性评估模型的难度表征能力。

Abstract

In this paper, an improved construction method of corpus with readability is proposed, and a large-scale Chinese sentence readability corpus is constructed. We then apply the deep learning method to the evaluation of the readability of Chinese sentence, and explores the influence of incorporating different levels of language difficulty features on the overall performance. The experimental results show that the accuracy of the absolute difficulty of sentences in this corpus reaches 78.69%, with an increase of 15% compared to the previous work.

关键词

深度学习 / 语言难度特征 / 句子可读性

Key words

deep learning / language difficulty characteristics / sentence readability

引用本文

导出引用
唐玉玲,张宇飞,于东. 结合深度学习和语言难度特征的句子可读性计算方法. 中文信息学报. 2022, 36(2): 29-39
TANG Yuling, ZHANG Yufei, YU Dong. Combination of Deep Learning and Language Difficulty Feature for Sentence Readability Metric. Journal of Chinese Information Processing. 2022, 36(2): 29-39

参考文献

[1] Crossley Scott A, Stephen Skalicky, Mihai Dascalu, et al. Predicting text comprehension, processing, and familiarity in adult readers: new approaches to readability formulas[J]. Discourse Processes,2017,54(5-6):340-359.
[2] Davison Alice, Robert N Kantor. On the failure of readability formulas to define readable texts: a case study from adaptations[J]. Reading Research Quarterly, 1982: 187-209.
[3] Luo S,Callan J. A statistical model for scientific readability[C]//Proceedings of the 10th International Conference on Information and Knowledge Management. ACM,2001:574-576
[4] Tanaka-Ishii K,Tezuka S, Terada H. Sorting texts by readability[J]. Computational Linguistics,2010,36(2):203-227.
[5] Kate R J,Luo X,Patwardhan S,et al. Learning to predict readability using diverse linguistic features[C]//Proceedings of the Conference of Coling - 23rd International Conference on Computational Linguistics, 2010: 546-556.
[6] Collins Thompson Kevyn. Computational assessment of text readability:a survey of current and future research[J]. ITL - International Journal of Applied Linguistics,2014,165(2):97-135.
[7] Sung Y T,Chen J L,Cha J H,et al. Constructing and validating readability models:the method of integrating multilevel linguistic features with machine learning[J]. Behavior Research Methods,2015,47(2):340-354.
[8] 吴思远,于东,江新. 2020 汉语文本可读性特征体系构建及其效度验证[J]. 世界汉语教学,2020(1):81-97.
[9] Feng L, Huenerfauth M. Cognitively motivated features for readability assessment[C]//Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics,2009:229-237.
[10] Karpov N,Baranova J,Vitugin F. Single-sentence readability prediction in Russian[C]//Proceedings of the International Conference on Analysis of Images,Social Networks and Texts. Springer,Cham,2014:91-100.
[11] 于东,吴思远,耿朝阳,等. 2020 基于众包标注的语文教材句子难易度评估研究[J]. 中文信息学报,34(2):16-26.
[12] Goodfellow Ian, Yoshua Bengio, Aaron Courville. Deep learning[M]. MIT Press, 2016.
[13] Collobert Ronan, Jason Weston, L'eon Bottou, et al. Natural language processing (almost) from scratch[J]. Journal of Machine Learning Research, 2011,12:2493-2537.
[14] Zhang Xiang,Junbo Zhao, Yann LeCun. Character-level convolutional networks for text classification[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems, 2015: 649-657.
[15] Matej Martinc, Senja Pollak, Marko Robnik- ˇ Sikonja. Supervised and unsupervised nerual approches to text readability[J]. Computational Linguistic Journal, 2019.
[16] Vajjala Sowmya, Detmar Meurers. On improving the accuracy of readability classification using insights from second language acquisition[C]//Proceedings of the 7th Workshop on Building Educational Applications Using NLP, Association for Computational Linguistics, 2012,163-173.
[17] Vajjala Sowmya, Ivana Lucic. Onestopenglish corpus: a new corpus for automatic readability assessment and text simplification[C]//Proceedings of the 13th Workshop on Innovative Use of NLP for Building Educational Applications, Association for Computationa Linguistics, 2018:297-304.
[18] Xu Wei, Chris Callison-Burch, Courtney Napoles. Problems in current text simplification research: new data can help[J]. Transactions of the Association of Computational Linguistics,2015, 3(1):283-297.
[19] Yang Zichao, Diyi Yang, Chris Dyer, et al. Hierarchical attention networks for document classification[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016: 1480-1489.
[20] Peng Zhou, Wei Shi, Jun Tian, et al. Attention-based bidirectional long short-term memory networks for relation classification[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics,2016: 207-212.
[21] Devlin Jacob, Mingwei Chang, Kenton Lee, et al. BERT: pretraining of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805, 2018.
[22] Tovly Deutsch, Masoud Jasbi, Stuart Shieber. Linguistic features for readability assessment[J]. arXiv preprint arXiv: 2006.00377, 2006.
[23] Kincaid J P,Fishburn R P,Chisson B S. Derivation of new readability formulas for navy enlisted personnel[J]. Adult Basic Education,1975:49.
[24] Laughlin G H M.SMOG grading: a new readability formula[J]. Journal of Reading, 1969, 12(8):639-646.
[25] Gunning Robert.The technique of clear writing[R]. McGraw Hill, New York, 1952.
[26] Dale Edgar, Jeanne S Chall. A formula for predicting readability: instructions[J]. Educational Research Bulletin, 1948: 37-54.
[27] Schwarm Sarah E, Mari Ostendorf. Reading level assessment using support vector machines and statistical language models[C]//Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 2005: 523-530.
[28] Petersen Sarah E, Mari Ostendorf. A machine learning approach to reading level assessment[J]. Computer Speech and Language, 2009, 23(1):89-106.
[29] Pilan I,Vajjala S,Volodina E. A readable read:automatic assessment of language learning materials based on linguistic complexity[J]. arXiv preprint 2016:1603.08868, 2016.
[30] 吴思远,蔡建永,于东. 文本可读性的自动分析研究综述[J]. 中文信息学报,2018,32(12):1-25.
[31] 王蕾. 可读性公式的内涵及研究范式: 兼议对外汉语可读性公式的研究任务[J]. 语言教学与研究,2008(6):46-53.
[32] Dell’Orletta F,Montemagni S,Venturi G. Read-it:assessing readability of Italian texts with a view to text simplification[C]//Proceedings of the 2nd Workshop on Speech and Language Processing for Assistive Technologies. Association for Computational Linguistics,2011:73-83.
[33] Brunato De Mattei, Dell’Orletta et al. Is this sentence difficult: do you agree?[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018:2690-2699.
[34] Schumacher E,Eskenazi M,Frishkoff G,et al. Predicting the relative difficulty of single sentences with and without surrounding context[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2016:1871-1881.
[35] 江少敏.句子难度度量研究[D].厦门:厦门大学硕士学位论文,2009.
[36] 庞成.汉语句子难易度影响因素分析[J]. 语文学刊(教育版),2016(1):18-19.
[37] 郭望皓.基于CRITIC 加权赋值的汉语句子难度测定[J]. 语文学刊(教育版),2016(12):10-12.
[38] Xia,Kochmar, Briscoe. Text readability assessment for second language learners[C]//Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications, 2016.
[39] Cho K. VanMerrienboer B. Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. Computer Science, 2014.
[40] Liu P, Qiu X, Huang X. Recurrent neural network for text classification with multi-task learning[C]//Proceedings of thd 25th International Joint Conference on Artificial Intelligence, 2016:2873-2879.
[41] Kim Y. Convolutional neural networks for sentence classification[J]. arXiv preprint arXiv: 1408.5882,2014.
[42] Ashish Vaswani, Noam Shazeer, Niki Parmar, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017,30: 5998-6008.
[43] Yukun Zhu, Ryan Kiros, Richard Zemel, et al. Aligning books and movies:towards story-like visual explanations by watching movies and reading books[J]. Computing Research Repository, arXiv:1506.06724, 2015.
[44] Thomas Wolf,Lysandre Debut, Victor Sanh, et al. HuggingFace’s transformers: state-of-the-art natural language processing[J]. arXiv preprint arXiv: 1910.0377lv4,2019.
[45] FabianPedregosa, Gal Varoquaux, Alexandre Gramfort, et al. Scikit-learn: machine learning in python[J]. Journal of Machine Learning Research,2011, 12:2825-2830.
[46] Shen Li, Zhe Zhao, Renfen Hu, et al. Analogical reasoning on chinese morphological and semantic relations[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018.
[47] Paszke A, Gross S, Massa F. PyTorch: an imperative style, high-performance deep learning library[C]//Proceedings of the 33rd Conference on Neural Infaormation Proceessing Systerms, 2019: 1-12.

基金

国家社会科学基金(17ZDA305);教育部人文社会科学研究青年基金(19 YJCZH230);北京语言大学中青年学术骨干支持计划
PDF(1393 KB)

Accesses

Citation

Detail

段落导航
相关文章

/