2023
pdf
abs
PARSEME corpus release 1.3
Agata Savary
|
Cherifa Ben Khelil
|
Carlos Ramisch
|
Voula Giouli
|
Verginica Barbu Mititelu
|
Najet Hadj Mohamed
|
Cvetana Krstev
|
Chaya Liebeskind
|
Hongzhi Xu
|
Sara Stymne
|
Tunga Güngör
|
Thomas Pickard
|
Bruno Guillaume
|
Eduard Bejček
|
Archna Bhatia
|
Marie Candito
|
Polona Gantar
|
Uxoa Iñurrieta
|
Albert Gatt
|
Jolanta Kovalevskaite
|
Timm Lichte
|
Nikola Ljubešić
|
Johanna Monti
|
Carla Parra Escartín
|
Mehrnoush Shamsfard
|
Ivelina Stoyanova
|
Veronika Vincze
|
Abigail Walsh
Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)
We present version 1.3 of the PARSEME multilingual corpus annotated with verbal multiword expressions. Since the previous version, new languages have joined the undertaking of creating such a resource, some of the already existing corpora have been enriched with new annotated texts, while others have been enhanced in various ways. The PARSEME multilingual corpus represents 26 languages now. All monolingual corpora therein use Universal Dependencies v.2 tagset. They are (re-)split observing the PARSEME v.1.2 standard, which puts impact on unseen VMWEs. With the current iteration, the corpus release process has been detached from shared tasks; instead, a process for continuous improvement and systematic releases has been introduced.
pdf
abs
shefnlp at SemEval-2023 Task 10: Compute-Efficient Category Adapters
Thomas Pickard
|
Tyler Loakman
|
Mugdha Pandya
Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023)
As social media platforms grow, so too does the volume of hate speech and negative sentiment expressed towards particular social groups. In this paper, we describe our approach to SemEval-2023 Task 10, involving the detection and classification of online sexism (abuse directed towards women), with fine-grained categorisations intended to facilitate the development of a more nuanced understanding of the ideologies and processes through which online sexism is expressed. We experiment with several approaches involving language model finetuning, class-specific adapters, and pseudo-labelling. Our best-performing models involve the training of adapters specific to each subtask category (combined via fusion layers) using a weighted loss function, in addition to performing naive pseudo-labelling on a large quantity of unlabelled data. We successfully outperform the baseline models on all 3 subtasks, placing 56th (of 84) on Task A, 43rd (of 69) on Task B,and 37th (of 63) on Task C.
2020
pdf
abs
Comparing word2vec and GloVe for Automatic Measurement of MWE Compositionality
Thomas Pickard
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons
This paper explores the use of word2vec and GloVe embeddings for unsupervised measurement of the semantic compositionality of MWE candidates. Through comparison with several human-annotated reference sets, we find word2vec to be substantively superior to GloVe for this task. We also find Simple English Wikipedia to be a poor-quality resource for compositionality assessment, but demonstrate that a sample of 10% of sentences in the English Wikipedia can provide a conveniently tractable corpus with only moderate reduction in the quality of outputs.