@inproceedings{attia-etal-2023-statistical,
    title = "Statistical Measures for Readability Assessment",
    author = "Attia, Mohammed  and
      Samih, Younes  and
      Ehara, Yo",
    editor = {H{\"a}m{\"a}l{\"a}inen, Mika  and
      {\"O}hman, Emily  and
      Pirinen, Flammie  and
      Alnajjar, Khalid  and
      Miyagawa, So  and
      Bizzoni, Yuri  and
      Partanen, Niko  and
      Rueter, Jack},
    booktitle = "Proceedings of the Joint 3rd International Conference on Natural Language Processing for Digital Humanities and 8th International Workshop on Computational Linguistics for Uralic Languages",
    month = dec,
    year = "2023",
    address = "Tokyo, Japan",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2023.nlp4dh-1.19/",
    pages = "153--161",
    abstract = "Neural models and deep learning techniques have predominantly been used in many tasks of natural language processing (NLP), including automatic readability assessment (ARA). They apply deep transfer learning and enjoy high accuracy. However, most of the models still cannot leverage long dependence such as inter-sentential topic-level or document-level information because of their structure and computational cost. Moreover, neural models usually have low interpretability. In this paper, we propose a generalization of passage-level, corpus-level, document-level and topic-level features. In our experiments, we show the effectiveness of ``Statistical Lexical Spread (SLS)'' features when combined with IDF (inverse document frequency) and TF-IDF (term frequency{--}inverse document frequency), which adds a topological perspective (inter-document) to readability to complement the typological approaches (intra-document) used in traditional readability formulas. Interestingly, simply adding these features in BERT models outperformed state-of-the-art systems trained on a large number of hand-crafted features derived from heavy linguistic processing. In analysis, we show that SLS is also easy-to-interpret because SLS computes lexical features, which appear explicitly in texts, compared to parameters in neural models."
}Markdown (Informal)
[Statistical Measures for Readability Assessment](https://preview.aclanthology.org/ingest-emnlp/2023.nlp4dh-1.19/) (Attia et al., NLP4DH-IWCLUL 2023)
ACL
- Mohammed Attia, Younes Samih, and Yo Ehara. 2023. Statistical Measures for Readability Assessment. In Proceedings of the Joint 3rd International Conference on Natural Language Processing for Digital Humanities and 8th International Workshop on Computational Linguistics for Uralic Languages, pages 153–161, Tokyo, Japan. Association for Computational Linguistics.