Roksana Goworek


2025

pdf bib
SenWiCh: Sense-Annotation of Low-Resource Languages for WiC using Hybrid Methods
Roksana Goworek | Harpal Singh Karlcut | Hamza Shezad | Nijaguna Darshana | Abhishek Mane | Syam Bondada | Raghav Sikka | Ulvi Mammadov | Rauf Allahverdiyev | Sriram Satkirti Purighella | Paridhi Gupta | Muhinyia Ndegwa | Bao Khanh Tran | Haim Dubossarsky
Proceedings of the 7th Workshop on Research in Computational Linguistic Typology and Multilingual NLP

This paper addresses the critical need for high-quality evaluation datasets in low-resource languages to advance cross-lingual transfer. While cross-lingual transfer offers a key strategy for leveraging multilingual pretraining to expand language technologies to understudied and typologically diverse languages, its effectiveness is dependent on quality and suitable benchmarks. We release new sense-annotated datasets of sentences containing polysemous words, spanning nine low-resource languages across diverse language families and scripts. To facilitate dataset creation, the paper presents a demonstrably beneficial semi-automatic annotation method. The utility of the datasets is demonstrated through Word-in-Context (WiC) formatted experiments that evaluate transfer on these low-resource languages. Results highlight the importance of targeted dataset creation and evaluation for effective polysemy disambiguation in low-resource settings and transfer studies. The released datasets and code aim to support further research into fair, robust, and truly multilingual NLP.

2024

pdf bib
Toward Sentiment Aware Semantic Change Analysis
Roksana Goworek | Haim Dubossarsky
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop

This student paper explores the potential of augmenting computational models of semantic change with sentiment information. It tests the efficacy of this approach on the English SemEval of Lexical Semantic Change and its associated historical corpora. We first establish the feasibility of our approach by demonstrating that existing models extract reliable sentiment information from historical corpora, and then validate that words that underwent semantic change also show greater sentiment change in comparison to historically stable words. We then integrate sentiment information into standard models of semantic change for individual words, and test if this can improve the overall performance of the latter, showing mixed results. This research contributes to our understanding of language change by providing the first attempt to enrich standard models of semantic change with additional information. It taps into the multifaceted nature of language change, that should not be reduced only to binary or scalar report of change, but adds additional dimensions to this change, sentiment being only one of these. As such, this student paper suggests novel directions for future work in integrating additional, more nuanced information of change and interpretation for finer-grained semantic change analysis.