Word Complexity Estimation for Japanese Lexical Simplification

Daiki Nishihara, Tomoyuki Kajiwara


Abstract
We introduce three language resources for Japanese lexical simplification: 1) a large-scale word complexity lexicon, 2) the first synonym lexicon for converting complex words to simpler ones, and 3) the first toolkit for developing and benchmarking Japanese lexical simplification system. Our word complexity lexicon is expanded to a broader vocabulary using a classifier trained on a small, high-quality word complexity lexicon created by Japanese language teachers. Based on this word complexity estimator, we extracted simplified word pairs from a large-scale synonym lexicon and constructed a simplified synonym lexicon useful for lexical simplification. In addition, we developed a Python library that implements automatic evaluation and key methods in each subtask to ease the construction of a lexical simplification pipeline. Experimental results show that the proposed method based on our lexicon achieves the highest performance of Japanese lexical simplification. The current lexical simplification is mainly studied in English, which is rich in language resources such as lexicons and toolkits. The language resources constructed in this study will help advance the lexical simplification system in Japanese.
Anthology ID:
2020.lrec-1.381
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
3114–3120
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.381
DOI:
Bibkey:
Cite (ACL):
Daiki Nishihara and Tomoyuki Kajiwara. 2020. Word Complexity Estimation for Japanese Lexical Simplification. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 3114–3120, Marseille, France. European Language Resources Association.
Cite (Informal):
Word Complexity Estimation for Japanese Lexical Simplification (Nishihara & Kajiwara, LREC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-1/2020.lrec-1.381.pdf