|
|
Automatic Lexical Simplificatio |
QIANG Jipeng1,LI Yun1,WU Xindong2,3 |
1. Department of Computer Science and Technology, Yangzhou University, Yangzhou, Jiangsu 225127, China;
2. Key Laboratory for Knowledge Engineering with Big Data (Hefei University of Technology), Ministry of Education, Hefei, Anhui 230009, China;
3. Research Institute of Big Knowledge, Hefei University of Technology, Hefei, Anhui 230009, China |
|
|
Abstract Automatic Lexical Simplification (LS) is the process of replacing complex words in a given sentence with simpler alternatives of equivalent meaning, which is an important research direction in text simplification. With the rapid development of natural language processing technology, LS methods are rapidly updated and changed. This paper surveys the existing works on lexical simplification. After introducing the framework of lexical simplification, we summary the query linguistic databases, automatic rules, word embeddings, merging model, and BERT for LS methods. Finally we discuss the difficulties of the study of lexical simplification, and provide the future developments of LS and draw our conclusions.
|
Received: 14 October 2020
|
|
|
|
|
[1]Rello L, Baeza Yates R, Dempere Marco L, et al. Frequentwords improve readability and short words improve understandability for people with dyslexia[C]//Proceedings of the 14th IFIP TC 13 International Conference on Human-Computer Interaction, 2013: 203-219.
[2]Carroll J, Minnen G, Canning Y, et al. Practical simplification of English newspaper text to assist aphasic readers[C]//Proceedings of the AAAI Workshop on Integrating Artificial Intelligence and Assistive Technology, 1998: 7-10
[3]Aluísio SM, Gasperin C. Fostering digital inclusion and accessibility: The PorSimples project for simplification of Portuguese texts[C]//Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas, 2010: 46-53.
[4]Horn C, Manduca C, Kauchak D, et al. Learning a lexical simplifier using wikipedia[C]//Proceedings of the Meeting of the Association for Computational Linguistics, 2014: 458-463.
[5]Glavas G,tajner S. Simplifying lexical simplification: Do we need simplified corpora?[C]//Proceedings of the Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing, 2015: 63-68.
[6]Bott S, Rello L, Drndarevic B, et al. Can Spanishbe simpler? LexSiS: Lexical simplification for Spanish[C]//Proceedings of the International Conference on Computational Linguistics, 2012: 357-374.
[7]Keskisrkk R. Automatictext simplification via synonym replacement[M]. Master’s Thesis, Linkping University, 2012.
[8]Hirsh D, Nation P. Whatvocabulary size is needed to read unsimplified texts for pleasure? [J]. Reading in a Foreign Language, 1992, 8(1): 689-696.
[9]Nation, P. Learning vocabulary in another language [M]. Cambridge University Press, 2001.
[10]Paetzold G, Specia L. Unsupervised lexical simplification for non-native speakers[C]//Proceedings of the AAAI, 2016: 3761-3767.
[11]Canning Y, Tait J, Archibald J, et al. Cohesive generation of syntac tically simplified newspaper text[C]//Proceedings of the 3rd International Workshop on Text, Speech and Dialogue, 2000: 145-150.
[12]Carroll J, Minnen G, Canning Y, et al. Practical simplification of English newspaper textto assist aphasic readers[C]//Proceedings of the AAAI Workshop on Integrating AI and Assistive Technology, 1998: 7-10.
[13]Siddharthan A. Syntactic simplification and text cohesion[J]. Research on Language and Computation, 2006, 4(1): 77-109.
[14]Specia L, Jauhar S K, Mihalcea R. SemEval 2012 task 1: English lexical simplification[C]//Proceedings of the 1st Joint Conference on Lexical and Computational Semantics, SemEval, 2012: 347-355.
[15]Koehn P, Hoang H, Birch A, et al. Moses: Open source toolkit for statistical machine translation[C]//Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, 2007: 177-180.
[16]Papineni K, Roukos S, Ward T, et al. BLEU: A method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002: 311-318.
[17]Zhang Y, Vogel S, Waibel A. Interpreting BLEU/NIST scores: How much improvement do we need to have a better system? [C]//Proceedings of the 4th International Conference on Language Resources and Evaluation, 2004.
[18]Coster W, Kauchak D. Learning to simplify sentences using Wikipedia[C]//Proceedings of the Workshop on Monolingual Text-To-Text Generation, 2011: 1-9.
[19]Wubben A, Bosch A, Krahmer E. Sentence simplification by monolingual machine translation[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, 2012: 1015-1024.
[20]Navarro G. A guided tour to approximate string matching[J]. ACM Computer Surveys, 2001, 33(1): 31-88.
[21]Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks[C]//Proceedings of Advances in Neural Information Processing Systems, 2014: 3104-3112.
[22]Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[C]//Proceedings of the 3rd International Conference on Learning Representations, 2015: 6-17.
[23]Wang T, Chen P, Rochford J, et al. Text simplification using neural machine translation[C]//Proceedings of the AAAI Conference on Artificial Intelligence: Student Abstract, 2016: 4270-4271.
[24]Nisioi S, Stajner S, Ponzetto S P, et al. Exploring neural text simplification models[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 85-91.
[25]Zhang X, Lapata M. Sentence Simplification with Deep Reinforcement Learning[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2017: 584-594.
[26]Zhao S, Meng R, He D, et al. Integrating transformer and paraphrase rules for sentence simplification[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018: 3164-3173.
[27]Sulem E, Abend O, Rappoport A. Simple and effective text simplification using semantic and neural methods[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 162-173.
[28]Guo H, Pasunuru R, Bansal M. Dynamicmulti-level multi-task learning for sentence simplification[C]//Proceedings of the Conference on Computational Natural Language Learning, 2018: 462-476.
[29]Dong Y, Li Z, Rezagholizadeh M,et al. EditNTS: An neural programmer-interpreter model for sentence simplification through explicit editing[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 3393-340.
[30]Scarton C, Specia L. Learning simplifications for specific target audiences[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 712-718.
[31]Nishihara D, Kajiwara T, Arase Y. Controllable text simplification with lexical constraint loss[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, 2019: 260-266.
[32]Devlin S, Tait J. The use of a psycholinguistic database in the simplification of text for aphasic readers[C]//Proceedings of the Linguistic Databases, 1998.
[33]Qiang J P, Li Y, Zhu Y, et al. Lexical simplification with pretrained encoders[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence, 2020: 8649-8656.
[34]Zhou W C S, Ge T, Xu K, et al. BERT-basedlexical substitution[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 3368-3373.
[35]Paetzold G, Lucia Specia. A survey on lexical simplification[J]. Journal of Artificial Intelligence Research, 2017,60: 549-593.
[36]Wróbel K. Plujagh at SemEval-2016 task 11: Simple system for complex word identification[C]//Proceedings of the 10th SemEval, 2016: 953-957.
[37]Kajiwara T, Matsumoto H, Yamamoto K, et al. Selectingproper lexical paraphrase for children[C]//Proceedings of the International Conference on Computational Linguistics, 2013: 59-73.
[38]Paetzold G, Specia L. Lexical simplification with neural ranking[C]//Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics, 2017: 34-40.
[39]Choubey P, Pateria S. Garuda and Bhasha at SemEval-2016 Task 11: Complexword identification using aggregated learning models[C]//Proceedings of the 10th International Workshop on Semantic Evaluation, 2016: 1006-1010.
[40]Mukherjee N, Patra B G, Das D, et al. JU_NLP at SemEval-2016 Task 11: Identifying Complex Words in 〖JP2〗a Sentence[C]//Proceedings of the 10th International Workshop on Semantic Evaluation, 2016: 986-990.
[41]Paetzold G, Specia L. SV000gg at SemEval-2016 Task 11: Heavygauge complex word identification with system voting[C]//Proceedings of the 53th Annual Meeting of the Association for Computational Linguistics, 2015: 969-974.
[42]Gooding S, Kochmar E. Complexword identification as a sequence labelling task[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 1148-1153.
[43]Biran O, Brody S, Elhadad N, et al. Putting itsimply: A context-aware approach to lexical simplification[C]//Proceedings of the Meeting of the Association for Computational Linguistics, 2011: 496-501.
[44]Gooding S, Kochmar E. Recursive context-aware lexical simplification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019: 4852-4862.
[45]Peters ME, Neumann M, Iyyer M, et al. Deep contextualized word representations[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018: 2227-2237.
[46]Radford A, Wu J, Child R, et al. Language models are unsupervised multitask learners[J]. OpenAI Blog, 2019, 1(8): 9.
[47]Devlin N, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv: 1810.04805, 2018.
[48]Yang Z, Dai Z, Yang Y, et al. XLNet: Generalized autoregressive pretraining for language understanding[J]. arXiv preprint arXiv: 1906.08237, 2019.
[49]Thomas S R, Anderson S. WordNet-based lexical simplification of a document[C]//Proceedings of the Conference on Natural Language Processing, 2012: 100-110.
[50]Pavlick E, Callisonpurch C. Simple PPDB: Aparaphrase database for simplification[C]//Proceedings of the Meeting of the Association for Computational Linguistics, 2016: 143-148.
[51]Kriz R, Miltsakaki E, Apidianaki M, et al. Simplification using paraphrases and context-based lexical substitution[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018, 207-217.
[52]Fellbaum C, Miller G. WordNet: An Electronic Lexical Database[M]. MIT Press, 1998.
[53]Rudell A P. Frequency of word usage and perceived word difficulty: Ratings of Kucera and Francis words[J]. Behavior Research Methods, 1993, 25 (4): 455-463.
[54]Janczura G A, Castilho G, Rocha N O, et al. Normas de concretude para 909 palavras da línguaportuguesa[M]. Psicologia Teoria E Pesquisa, 2007.
[55]Maziero E G, Pardo T A, Felippo A D, et al. A base de dados lexical a interface web do TeP 2.0: thesaurus eletrnico para o Português do Brasil[C]//Brazilian Symposium on Multimedia and the Web, 2008: 390-392.
[56]Kai M, Matsukawa T. Method of Vocabulary Teaching: Vocabulary TableVersion[M].Mitsumura Tosho Publishing Co., Ltd., 2002.
[57]EDR Japanese Word Dictionary[M]. Japan Electronic Dictionary Research Institute, Ltd. (EDR),1995.
[58]Kenbo H, Kindaichi K, Kindaichi H, et al. Sanseido Japanese Dictionary[M]. Sanseido Publishing Co., Ltd, 1994.
[59]Minato Y. The Challenge Elementary School Japanese Dictionary[M]. Benesse Holdings, Inc. 2011.
[60]Och F J, Ney H. Improved statistical alignment models[C]//Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, 2000: 440-447.
[61]Joachims T. Training linearSVMs in linear time[C]//Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006: 217-226.
[62]Pavlick E, Rastogi P, Ganitkevitch J, et al. PPDB 20: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, 2015: 425-430.
[63]Xu W, Callison Burch C, Napoles C. Problems in current text simplification research: New data can help[J].Transactions of the Association of Computational Linguistics, 2015, 3(1), 283-297.
[64]Melamud O, Levy O, Dagan I. Asimple word embedding model for lexical substitution[C]//Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, 2015: 1-7.
[65]Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of Advances in Neural Information Processing Systems, 2013: 3111-3119.
[66]Yimam S M, Biemann C, Malmasi S, et al. Areport on the complex word identification shared task 2018[C]//Proceedings of the 13th Workshop on Innovative Use of NLP for Building Educational Applications, 2018: 66-78.
[67]Robbins T. Jitterbug Perfume[M]. Random House Publishing Group, 2003.
[68]Faruqui M, Dodge J, Jauhar S K, et al. Retrofitting Word Vectors to Semantic Lexicons[J]. arXiv preprint arXiv: 1411.4166, 2014.
[69]左虹, 朱勇. 中级欧美留学生汉语文本可读性公式研究[J]. 世界汉语教学, 2014, 2(1): 263-276.
[70]吴思远, 蔡建永, 于东, 等. 文本可读性的自动分析研究综述[J]. 中文信息学报, 2018, 32(12): 1-10.
[71]Lee J, Seneff S. Automatic grammar correction for second-language learners[C]//Proceedings of the 9th International Conference on Spoken Language Processing, 2006: 109-119. |
|
|
|