Creating a Hybrid Rule and Neural Network Based Semantic Tagger Using Silver Standard Data: The PyMUSAS Framework for Multilingual Semantic Annotation

Andrew Moore, Paul Rayson, Dawn Archer, Tim Czerniak, Dawn Knight, Daisy Monika Lal, Gearóid Ó Donnchadha, Mícheál J. Ó Meachair, Scott Piao, Elaine Uí Dhonnchadha, Johanna Vuorinen, Yan Yabo, Xiaobin Yang


Abstract
Word Sense Disambiguation (WSD) has been widely evaluated using the semantic frameworks of WordNet, BabelNet, and the Oxford Dictionary of English. However, for the UCREL Semantic Analysis System (USAS) framework, no open extensive evaluation has been performed beyond lexical coverage or single language evaluation. In this work, we perform the largest semantic tagging evaluation of the rule based system that uses the lexical resources in the USAS framework covering five different languages using four existing datasets and one novel Chinese dataset. We create a new silver labelled English dataset, to overcome the lack of manually tagged training data, that we train and evaluate various mono and multilingual neural models in both mono and cross-lingual evaluation setups with comparisons to their rule based counterparts, and show how a rule based system can be enhanced with a neural network model. The resulting neural network models, including the data they were trained on, the Chinese evaluation dataset, and all of the code will be released as open resources.
Anthology ID:
2026.lrec-main.926
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
11822–11833
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.926/
DOI:
Bibkey:
Cite (ACL):
Andrew Moore, Paul Rayson, Dawn Archer, Tim Czerniak, Dawn Knight, Daisy Monika Lal, Gearóid Ó Donnchadha, Mícheál J. Ó Meachair, Scott Piao, Elaine Uí Dhonnchadha, Johanna Vuorinen, Yan Yabo, and Xiaobin Yang. 2026. Creating a Hybrid Rule and Neural Network Based Semantic Tagger Using Silver Standard Data: The PyMUSAS Framework for Multilingual Semantic Annotation. International Conference on Language Resources and Evaluation, main:11822–11833.
Cite (Informal):
Creating a Hybrid Rule and Neural Network Based Semantic Tagger Using Silver Standard Data: The PyMUSAS Framework for Multilingual Semantic Annotation (Moore et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.926.pdf