DLU: Dictionary Look-Up Data and Prediction

David Strohmaier, Gladys Tyen, Hongyi Gu, Diane Nicholls, Zheng Yuan, Paula Buttery


Abstract
Knowing which words language learners struggle with is crucial for developing personalised education technologies. In this paper, we advocate for the novel task of “dictionary look-up prediction” as a means for evaluating the complexity of words in reading tasks. We release the Dictionary Look-Up development dataset (DLU-dev) and the Dialogue Dictionary Look-Up dataset (D-DLU), which is based on chatbot dialogues. We demonstrate that dictionary look-up is a challenging task for LLMs (results are presented for LLaMA, Gemma, and Longformer models). We explore finetuning with the ROC* loss function as a more appropriate loss for this task than the commonly used Binary Cross Entropy (BCE). We show that a feature-based model outperforms the LLMs. Finally, we investigate the transfer between DLU and the related tasks of Complex Word Identification (CWI) and Semantic Error Prediction (SEP), establishing new state-of-the-art results for SEP.
Anthology ID:
2025.conll-1.32
Volume:
Proceedings of the 29th Conference on Computational Natural Language Learning
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Gemma Boleda, Michael Roth
Venues:
CoNLL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
481–501
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.conll-1.32/
DOI:
Bibkey:
Cite (ACL):
David Strohmaier, Gladys Tyen, Hongyi Gu, Diane Nicholls, Zheng Yuan, and Paula Buttery. 2025. DLU: Dictionary Look-Up Data and Prediction. In Proceedings of the 29th Conference on Computational Natural Language Learning, pages 481–501, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
DLU: Dictionary Look-Up Data and Prediction (Strohmaier et al., CoNLL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.conll-1.32.pdf