Yo Ehara

2026

Recovering Registers from Leveled Wordlists
Yo Ehara
Proceedings of the Fifteenth Language Resources and Evaluation Conference

For vocabulary learning in language acquisition, it is desirable for learners to acquire words that they are likely to need in the language environments they will encounter. Such language environments are referred to as “registers” in general corpora, which are typically designed to include diverse registers. However, the proportion of registers included, that is, which registers are included and to what extent, is determined by the circumstances under which each general corpus was compiled and is not necessarily optimized for language learning. To bridge this gap, various leveled wordlists have been created in language education using linguistic resources other than word frequency, such as expert judgment and learner responses. However, it has not been quantitatively clear what gap in register proportions in general corpora these leveled wordlists were designed to fill. This study proposes a method that, given a leveled wordlist and a general corpus, estimates the register ratio that best aligns the frequency ordering of words across registers with the leveled wordlist. This makes it easier for learners and educators to interpret which wordlists are appropriate for particular learning goals. Our method is formulated as a linear programming problem and yields a globally optimal solution. Unlike neural networks, it is less susceptible to variation due to initial values or approximation and is therefore easier to interpret. We evaluated the proposed method on two languages, English and Japanese, through a range of experiments. We further show that it can also be used to evaluate vocabulary lists created for specific contexts, such as those generated by Large Language Models like ChatGPT.

2025

pdf bib

Keeping LLMs from Being Distracted: Grade-Aware Kanji Reading Estimation Fully Executable in Web Browsers for Japanese Education
Yo Ehara
Proceedings of the 39th Pacific Asia Conference on Language, Information and Computation

2024

pdf bib

An Analytical Study of the Flesch-Kincaid Readability Formulae to Explain Their Robustness over Time
Yo Ehara
Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation

2023

pdf bib abs

Statistical Measures for Readability Assessment
Mohammed Attia | Younes Samih | Yo Ehara
Proceedings of the Joint 3rd International Conference on Natural Language Processing for Digital Humanities and 8th International Workshop on Computational Linguistics for Uralic Languages

Neural models and deep learning techniques have predominantly been used in many tasks of natural language processing (NLP), including automatic readability assessment (ARA). They apply deep transfer learning and enjoy high accuracy. However, most of the models still cannot leverage long dependence such as inter-sentential topic-level or document-level information because of their structure and computational cost. Moreover, neural models usually have low interpretability. In this paper, we propose a generalization of passage-level, corpus-level, document-level and topic-level features. In our experiments, we show the effectiveness of “Statistical Lexical Spread (SLS)” features when combined with IDF (inverse document frequency) and TF-IDF (term frequency–inverse document frequency), which adds a topological perspective (inter-document) to readability to complement the typological approaches (intra-document) used in traditional readability formulas. Interestingly, simply adding these features in BERT models outperformed state-of-the-art systems trained on a large number of hand-crafted features derived from heavy linguistic processing. In analysis, we show that SLS is also easy-to-interpret because SLS computes lexical features, which appear explicitly in texts, compared to parameters in neural models.

2021

pdf bib abs

Evaluation of Unsupervised Automatic Readability Assessors Using Rank Correlations
Yo Ehara
Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems

Automatic readability assessment (ARA) is the task of automatically assessing readability with little or no human supervision. ARA is essential for many second language acquisition applications to reduce the workload of annotators, who are usually language teachers. Previous unsupervised approaches manually searched textual features that correlated well with readability labels, such as perplexity scores of large language models. This paper argues that, to evaluate an assessors’ performance, rank-correlation coefficients should be used instead of Pearson’s correlation coefficient (𝜌). In the experiments, we show that its performance can be easily underestimated using Pearson’s 𝜌, which is significantly affected by the linearity of the output readability scores. We also propose a lightweight unsupervised readability assessor that achieved the best performance in both the rank correlations and Pearson’s 𝜌 among all unsupervised assessors compared.

pdf bib abs

To What Extent Can English-as-a-Second Language Learners Read Economic News Texts?
Yo Ehara
Proceedings of the Third Workshop on Economics and Natural Language Processing

In decision making in the economic field, an especially important requirement is to rapidly understand news to absorb ever-changing economic situations. Given that most economic news is written in English, the ability to read such information without waiting for a translation is particularly valuable in economics in contrast to other fields. In consideration of this issue, this research investigated the extent to which non-native English speakers are able to read economic news to make decisions accordingly – an issue that has been rarely addressed in previous studies. Using an existing standard dataset as training data, we created a classifier that automatically evaluates the readability of text with high accuracy for English learners. Our assessment of the readability of an economic news corpus revealed that most news texts can be read by intermediate English learners. We also found that in some cases, readability varies considerably depending on the knowledge of certain words specific to the economic field.

pdf bib

Readability and Linearity
Yo Ehara
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation

pdf bib abs

To What Extent Does Lexical Normalization Help English-as-a-Second Language Learners to Read Noisy English Texts?
Yo Ehara
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)

How difficult is it for English-as-a-second language (ESL) learners to read noisy English texts? Do ESL learners need lexical normalization to read noisy English texts? These questions may also affect community formation on social networking sites where differences can be attributed to ESL learners and native English speakers. However, few studies have addressed these questions. To this end, we built highly accurate readability assessors to evaluate the readability of texts for ESL learners. We then applied these assessors to noisy English texts to further assess the readability of the texts. The experimental results showed that although intermediate-level ESL learners can read most noisy English texts in the first place, lexical normalization significantly improves the readability of noisy English texts for ESL learners.

2020

pdf bib abs

Interpreting Neural CWI Classifiers’ Weights as Vocabulary Size
Yo Ehara
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

Complex Word Identification (CWI) is a task for the identification of words that are challenging for second-language learners to read. Even though the use of neural classifiers is now common in CWI, the interpretation of their parameters remains difficult. This paper analyzes neural CWI classifiers and shows that some of their parameters can be interpreted as vocabulary size. We present a novel formalization of vocabulary size measurement methods that are practiced in the applied linguistics field as a kind of neural classifier. We also contribute to building a novel dataset for validating vocabulary testing and readability via crowdsourcing.

2019

pdf bib

An Approach to Summarize Concordancers’ Lists Visually to Support Language Learners in UnderstandingWord Usages
Yo Ehara
Proceedings of the 1st Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence (NL4XAI 2019)

2018

pdf bib

Building an English Vocabulary Knowledge Dataset of Japanese English-as-a-Second-Language Learners Using Crowdsourcing
Yo Ehara
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib abs

Language-Independent Prediction of Psycholinguistic Properties of Words
Yo Ehara
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

The psycholinguistic properties of words, namely, word familiarity, age of acquisition, concreteness, and imagery, have been reported to be effective for educational natural language-processing tasks. Previous studies on predicting the values of these properties rely on language-dependent features. This paper is the first to propose a practical language-independent method for predicting such values by using only a large raw corpus in a language. Through experiments, our method successfully predicted the values of these properties in two languages. The results for English were competitive with the reported accuracy achieved using features specific to English.

2016

pdf bib abs

Automatic video description generation has recently been getting attention after rapid advancement in image caption generation. Automatically generating description for a video is more challenging than for an image due to its temporal dynamics of frames. Most of the work relied on Recurrent Neural Network (RNN) and recently attentional mechanisms have also been applied to make the model learn to focus on some frames of the video while generating each word in a describing sentence. In this paper, we focus on a sequence-to-sequence approach with temporal attention mechanism. We analyze and compare the results from different attention model configuration. By applying the temporal attention mechanism to the system, we can achieve a METEOR score of 0.310 on Microsoft Video Description dataset, which outperformed the state-of-the-art system so far.