Diana McCarthy

Also published as: Diana F. McCarthy

2022

pdf abs
Measuring Context-Word Biases in Lexical Semantic Datasets
Qianchu Liu | Diana McCarthy | Anna Korhonen
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

State-of-the-art pretrained contextualized models (PCM) eg. BERT use tasks such as WiC and WSD to evaluate their word-in-context representations. This inherently assumes that performance in these tasks reflect how well a model represents the coupled word and context semantics. We question this assumption by presenting the first quantitative analysis on the context-word interaction being tested in major contextual lexical semantic tasks. To achieve this, we run probing baselines on masked input, and propose measures to calculate and visualize the degree of context or word biases in existing datasets. The analysis was performed on both models and humans. Our findings demonstrate that models are usually not being tested for word-in-context semantics in the same way as humans are in these tasks, which helps us better understand the model-human gap. Specifically, to PCMs, most existing datasets fall into the extreme ends (the retrieval-based tasks exhibit strong target word bias while WiC-style tasks and WSD show strong context bias); In comparison, humans are less biased and achieve much better performance when both word and context are available than with masked input. We recommend our framework for understanding and controlling these biases for model interpretation and future task design.

2021

Research into representation learning models of lexical semantics usually utilizes some form of intrinsic evaluation to ensure that the learned representations reflect human semantic judgments. Lexical semantic similarity estimation is a widely used evaluation method, but efforts have typically focused on pairwise judgments of words in isolation, or are limited to specific contexts and lexical stimuli. There are limitations with these approaches that either do not provide any context for judgments, and thereby ignore ambiguity, or provide very specific sentential contexts that cannot then be used to generate a larger lexical resource. Furthermore, similarity between more than two items is not considered. We provide a full description and analysis of our recently proposed methodology for large-scale data set construction that produces a semantic classification of a large sample of verbs in the first phase, as well as multi-way similarity judgments made within the resultant semantic classes in the second phase. The methodology uses a spatial multi-arrangement approach proposed in the field of cognitive neuroscience for capturing multi-way similarity judgments of visual stimuli. We have adapted this method to handle polysemous linguistic stimuli and much larger samples than previous work. We specifically target verbs, but the method can equally be applied to other parts of speech. We perform cluster analysis on the data from the first phase and demonstrate how this might be useful in the construction of a comprehensive verb resource. We also analyze the semantic information captured by the second phase and discuss the potential of the spatially induced similarity judgments to better reflect human notions of word similarity. We demonstrate how the resultant data set can be used for fine-grained analyses and evaluation of representation learning models on the intrinsic tasks of semantic clustering and semantic similarity. In particular, we find that stronger static word embedding methods still outperform lexical representations emerging from more recent pre-training methods, both on word-level similarity and clustering. Moreover, thanks to the data set’s vast coverage, we are able to compare the benefits of specializing vector representations for a particular type of external knowledge by evaluating FrameNet- and VerbNet-retrofitted models on specific semantic domains such as “Heat” or “Motion.”

pdf abs
AM2iCo: Evaluating Word Meaning in Context across Low-Resource Languages with Adversarial Examples
Qianchu Liu | Edoardo Maria Ponti | Diana McCarthy | Ivan Vulić | Anna Korhonen
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Capturing word meaning in context and distinguishing between correspondences and variations across languages is key to building successful multilingual and cross-lingual text representation models. However, existing multilingual evaluation datasets that evaluate lexical semantics “in-context” have various limitations. In particular, 1) their language coverage is restricted to high-resource languages and skewed in favor of only a few language families and areas, 2) a design that makes the task solvable via superficial cues, which results in artificially inflated (and sometimes super-human) performances of pretrained encoders, and 3) no support for cross-lingual evaluation. In order to address these gaps, we present AM2iCo (Adversarial and Multilingual Meaning in Context), a wide-coverage cross-lingual and multilingual evaluation set; it aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts for 14 language pairs. We conduct a series of experiments in a wide range of setups and demonstrate the challenging nature of AM2iCo. The results reveal that current SotA pretrained encoders substantially lag behind human performance, and the largest gaps are observed for low-resource languages and languages dissimilar to English.

2020

pdf abs
Towards Better Context-aware Lexical Semantics:Adjusting Contextualized Representations through Static Anchors
Qianchu Liu | Diana McCarthy | Anna Korhonen
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

One of the most powerful features of contextualized models is their dynamic embeddings for words in context, leading to state-of-the-art representations for context-aware lexical semantics. In this paper, we present a post-processing technique that enhances these representations by learning a transformation through static anchors. Our method requires only another pre-trained model and no labeled data is needed. We show consistent improvement in a range of benchmark tasks that test contextual variations of meaning both across different usages of a word and across different words as they are used in context. We demonstrate that while the original contextual representations can be improved by another embedding space from both contextualized and static models, the static embeddings, which have lower computational requirements, provide the most gains.

pdf abs
Spatial Multi-Arrangement for Clustering and Multi-way Similarity Dataset Construction
Olga Majewska | Diana McCarthy | Jasper van den Bosch | Nikolaus Kriegeskorte | Ivan Vulić | Anna Korhonen
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present a novel methodology for fast bottom-up creation of large-scale semantic similarity resources to support development and evaluation of NLP systems. Our work targets verb similarity, but the methodology is equally applicable to other parts of speech. Our approach circumvents the bottleneck of slow and expensive manual development of lexical resources by leveraging semantic intuitions of native speakers and adapting a spatial multi-arrangement approach from cognitive neuroscience, used before only with visual stimuli, to lexical stimuli. Our approach critically obtains judgments of word similarity in the context of a set of related words, rather than of word pairs in isolation. We also handle lexical ambiguity as a natural consequence of a two-phase process where verbs are placed in broad semantic classes prior to the fine-grained spatial similarity judgments. Our proposed design produces a large-scale verb resource comprising 17 relatedness-based classes and a verb similarity dataset containing similarity scores for 29,721 unique verb pairs and 825 target verbs, which we release with this paper.

pdf abs
Manual Clustering and Spatial Arrangement of Verbs for Multilingual Evaluation and Typology Analysis
Olga Majewska | Ivan Vulić | Diana McCarthy | Anna Korhonen
Proceedings of the 28th International Conference on Computational Linguistics

We present the first evaluation of the applicability of a spatial arrangement method (SpAM) to a typologically diverse language sample, and its potential to produce semantic evaluation resources to support multilingual NLP, with a focus on verb semantics. We demonstrate SpAM’s utility in allowing for quick bottom-up creation of large-scale evaluation datasets that balance cross-lingual alignment with language specificity. Starting from a shared sample of 825 English verbs, translated into Chinese, Japanese, Finnish, Polish, and Italian, we apply a two-phase annotation process which produces (i) semantic verb classes and (ii) fine-grained similarity scores for nearly 130 thousand verb pairs. We use the two types of verb data to (a) examine cross-lingual similarities and variation, and (b) evaluate the capacity of static and contextualised representation models to accurately reflect verb semantics, contrasting the performance of large language specific pretraining models with their multilingual equivalent on semantic clustering and lexical similarity, across different domains of verb meaning. We release the data from both phases as a large-scale multilingual resource, comprising 85 verb classes and nearly 130k pairwise similarity scores, offering a wealth of possibilities for further evaluation and research on multilingual verb semantics.

2019

pdf abs
Second-order contexts from lexical substitutes for few-shot learning of word representations
Qianchu Liu | Diana McCarthy | Anna Korhonen
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

There is a growing awareness of the need to handle rare and unseen words in word representation modelling. In this paper, we focus on few-shot learning of emerging concepts that fully exploits only a few available contexts. We introduce a substitute-based context representation technique that can be applied on an existing word embedding space. Previous context-based approaches to modelling unseen words only consider bag-of-word first-order contexts, whereas our method aggregates contexts as second-order substitutes that are produced by a sequence-aware sentence completion model. We experimented with three tasks that aim to test the modelling of emerging concepts. We found that these tasks show different emphasis on first and second order contexts, and our substitute-based method achieves superior performance on naturally-occurring contexts from corpora.

pdf abs
Investigating Cross-Lingual Alignment Methods for Contextualized Embeddings with Token-Level Evaluation
Qianchu Liu | Diana McCarthy | Ivan Vulić | Anna Korhonen
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

In this paper, we present a thorough investigation on methods that align pre-trained contextualized embeddings into shared cross-lingual context-aware embedding space, providing strong reference benchmarks for future context-aware crosslingual models. We propose a novel and challenging task, Bilingual Token-level Sense Retrieval (BTSR). It specifically evaluates the accurate alignment of words with the same meaning in cross-lingual non-parallel contexts, currently not evaluated by existing tasks such as Bilingual Contextual Word Similarity and Sentence Retrieval. We show how the proposed BTSR task highlights the merits of different alignment methods. In particular, we find that using context average type-level alignment is effective in transferring monolingual contextualized embeddings cross-lingually especially in non-parallel contexts, and at the same time improves the monolingual space. Furthermore, aligning independently trained models yields better performance than aligning multilingual embeddings with shared vocabulary.

Paraphrases extracted from parallel corpora by the pivot method (Bannard and Callison-Burch, 2005) constitute a valuable resource for multilingual NLP applications. In this study, we analyse the semantics of unigram pivot paraphrases and use a graph-based sense induction approach to unveil hidden sense distinctions in the paraphrase sets. The comparison of the acquired senses to gold data from the Lexical Substitution shared task (McCarthy and Navigli, 2007) demonstrates that sense distinctions exist in the paraphrase sets and highlights the need for a disambiguation step in applications using this resource.

pdf
Learning Word Sense Distributions, Detecting Unattested Senses and Identifying Novel Senses Using Topic Models
Jey Han Lau | Paul Cook | Diana McCarthy | Spandana Gella | Timothy Baldwin
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Novel Word-sense Identification
Paul Cook | Jey Han Lau | Diana McCarthy | Timothy Baldwin
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

pdf
Measuring Word Meaning in Context
Katrin Erk | Diana McCarthy | Nicholas Gaylord
Computational Linguistics, Volume 39, Issue 3 - September 2013

pdf
Diathesis alternation approximation for verb clustering
Lin Sun | Diana McCarthy | Anna Korhonen
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf
Word Sense Induction for Novel Sense Detection
Jey Han Lau | Paul Cook | Diana McCarthy | David Newman | Timothy Baldwin
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf
Unsupervised Estimation of Word Usage Similarity
Marco Lui | Timothy Baldwin | Diana McCarthy
Proceedings of the Australasian Language Technology Association Workshop 2012

pdf
The Effects of Semantic Annotations on Precision Parse Ranking
Andrew MacKinlay | Rebecca Dridan | Diana McCarthy | Timothy Baldwin
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf
DSS: Text Similarity Using Lexical Alignments of Form, Distributional Semantics and Grammatical Relations
Diana McCarthy | Spandana Gella | Siva Reddy
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2011

pdf
Exemplar-Based Word-Space Model for Compositionality Detection: Shared Task System Description
Siva Reddy | Diana McCarthy | Suresh Manandhar | Spandana Gella
Proceedings of the Workshop on Distributional Semantics and Compositionality

pdf
Predicting Thread Linking Structure by Lexical Chaining
Li Wang | Diana McCarthy | Timothy Baldwin
Proceedings of the Australasian Language Technology Association Workshop 2011

pdf
An Empirical Study on Compositionality in Compound Nouns
Siva Reddy | Diana McCarthy | Suresh Manandhar
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf
Dynamic and Static Prototype Vectors for Semantic Composition
Siva Reddy | Ioannis Klapaftis | Diana McCarthy | Suresh Manandhar
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf
Fast Syntactic Searching in Very Large Corpora for Many Languages
Miloš Jakubíček | Adam Kilgarriff | Diana McCarthy | Pavel Rychlý
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

pdf bib
SemEval-2010 Task 2: Cross-Lingual Lexical Substitution
Rada Mihalcea | Ravi Sinha | Diana McCarthy
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf
IIITH: Domain Specific Word Sense Disambiguation
Siva Reddy | Abhilash Inumella | Diana McCarthy | Mark Stevenson
Proceedings of the 5th International Workshop on Semantic Evaluation

2009

pdf
Estimating and Exploiting the Entropy of Sense Distributions
Peng Jin | Diana McCarthy | Rob Koeling | John Carroll
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf bib
Investigations on Word Senses and Word Usages
Katrin Erk | Diana McCarthy | Nicholas Gaylord
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Tutorial Abstracts of ACL-IJCNLP 2009
Diana McCarthy | Chengqing Zong
Tutorial Abstracts of ACL-IJCNLP 2009

pdf bib
Invited Talk: Alternative Annotations of Word Usage
Diana McCarthy
Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009)

pdf
SemEval-2010 Task 2: Cross-Lingual Lexical Substitution
Ravi Sinha | Diana McCarthy | Rada Mihalcea
Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009)

pdf
Graded Word Sense Assignment
Katrin Erk | Diana McCarthy
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

pdf
From Predicting Predominant Senses to Local Context for Word Sense Disambiguation
Rob Koeling | Diana McCarthy
Semantics in Text Processing. STEP 2008 Conference Proceedings

pdf abs
Lexical Substitution as a Framework for Multiword Evaluation
Diana McCarthy
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we analyse data from the SemEval lexical substitution task in those cases where the annotators indicated that the target word was part of a phrase before substituting the target with a synonym. We classify the types of phrases that were provided in this way by the annotators in order to evaluate the utility of the method as a means of producing a gold-standard for multiword evaluation. Multiword evaluation is a difficult area because lexical resources are not complete and peoples judgments on multiwords vary. Whilst we do not believe lexical substitution is necessarily a panacea for multiword evaluation, we do believe it is a useful methodology because the annotator is focused on the task of substitution. Following the analysis, we make some recommendations which would make the data easier to classify.

pdf
Gloss-Based Semantic Similarity Metrics for Predominant Sense Acquisition
Ryu Iida | Diana McCarthy | Rob Koeling
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I