In the LLM era, Word Sense Induction remains unsolved

Anna Mosolova, Marie Candito, Carlos Ramisch


Abstract
In the absence of sense-annotated data, word sense induction (WSI) is a compelling alternative to word sense disambiguation, particularly in low-resource or domain-specific settings. In this paper, we emphasize methodological problems in current WSI evaluation. We propose an evaluation on a SemCor-derived dataset, respecting the original corpus polysemy and frequency distributions. We assess pre-trained embeddings and clustering algorithms across parts of speech, and propose and evaluate an LLM-based WSI method for English. We evaluate data augmentation sources (LLM-generated, corpus and lexicon), and semi-supervised scenarios using Wiktionary for data augmentation, must-link constraints, number of clusters per lemma.We find that no unsupervised method (whether ours or previous) surpasses the strong “one cluster per lemma” heuristic (1cpl). We also show that (i) results and best systems may vary across POS, (ii) LLMs have troubles performing this task, (iii) data augmentation is beneficial and (iv) capitalizing on Wiktionary does help. It surpasses previous SOTA system on our test set by 3.3%. WSI is not solved, and calls for a better articulation of lexicons and LLMs’ lexical semantics capabilities.
Anthology ID:
2025.findings-acl.882
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venues:
Findings | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17161–17178
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.findings-acl.882/
DOI:
Bibkey:
Cite (ACL):
Anna Mosolova, Marie Candito, and Carlos Ramisch. 2025. In the LLM era, Word Sense Induction remains unsolved. In Findings of the Association for Computational Linguistics: ACL 2025, pages 17161–17178, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
In the LLM era, Word Sense Induction remains unsolved (Mosolova et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.findings-acl.882.pdf