Paola Merlo


2024

pdf
Tracking linguistic information in transformer-based sentence embeddings through targeted sparsification
Vivi Nastase | Paola Merlo
Proceedings of the 9th Workshop on Representation Learning for NLP (RepL4NLP-2024)

Analyses of transformer-based models have shown that they encode a variety of linguistic information from their textual input. While these analyses have shed a light on the relation between linguistic information on one side, and internal architecture and parameters on the other, a question remains unanswered: how is this linguistic information reflected in sentence embeddings? Using datasets consisting of sentences with known structure, we test to what degree information about chunks (in particular noun, verb or prepositional phrases), such as grammatical number, or semantic role, can be localized in sentence embeddings. Our results show that such information is not distributed over the entire sentence embedding, but rather it is encoded in specific regions. Understanding how the information from an input text is compressed into sentence embeddings helps understand current transformer models and help build future explainable neural models.

2023

pdf
Blackbird language matrices (BLM), a new task for rule-like generalization in neural networks: Can Large Language Models pass the test?
Paola Merlo
Findings of the Association for Computational Linguistics: EMNLP 2023

How do we evaluate Large Language Models (LLMs) and determine the aspects and limits of their intelligent behaviour? It is currently conjectured that shortcomings of LLMs in multi-linguality and reasoning are due to a lack of ability to generalize. It has been argued that, instead, humans are better at generalization because they have a tendency at extracting rules from complex data. We propose a method to evaluate LLMs ability to rule-based generalization. When exposed to tests of analytic intelligence, for example the visual RAVEN IQ test, human problem-solvers identify the relevant objects in the picture and their relevant attributes and reason based on rules applied to them. Based on the induced rules, they are able to provide a generalisation and a solution to the test. An analogous language task has recently been proposed (called BLM) for LLM. In this paper, we argue that we can use this task to investigate what linguistic reasoning LLM develop, by asking them to solve some simple variants of the BLM task. We find that current state-of-the-art generative models, such as ChatGPT, can handle the task in the sense that they easily understand the instructions and can provide step-by-step reasoning that shows that it can solve two of the main cognitive hurdles: correspondence finding (object and attribute identification) and item novelty. However, overall they cannot find the correct answer, even with considerable help. In particular, they never identify the structure of the problem, exhibiting, we hypothesize, a lack of goal and subgoal management abilities, an ability that has been argued to measure differential abilities in humans. We argue that this finding supports the usefulness of the task as a method to test the limits and specific properties of generalisation ability in Large Language Models, providing an intrinsic evaluation method inspired by tests of human intelligence.

pdf
BLM-s/lE: A structured dataset of English spray-load verb alternations for testing generalization in LLMs
Giuseppe Samo | Vivi Nastase | Chunyang Jiang | Paola Merlo
Findings of the Association for Computational Linguistics: EMNLP 2023

Current NLP models appear to be achieving performance comparable to human capabilities on well-established benchmarks. New benchmarks are now necessary to test deeper layers of understanding of natural languages by these models. Blackbird’s Language Matrices are a recently developed framework that draws inspiration from tests of human analytic intelligence. The BLM task has revealed that successful performances in previously studied linguistic problems do not yet stem from a deep understanding of the generative factors that define these problems. In this study, we define a new BLM task for predicate-argument structure, and develop a structured dataset for its investigation, concentrating on the spray-load verb alternations in English, as a case study. The context sentences include one alternant from the spray-load alternation and the target sentence is the other alternant, to be chosen among a minimally contrastive and adversarial set of answers. We describe the generation process of the dataset and the reasoning behind the generating rules. The dataset aims to facilitate investigations into how verb information is encoded in sentence embeddings and how models generalize to the complex properties of argument structures. Benchmarking experiments conducted on the dataset and qualitative error analysis on the answer set reveal the inherent challenges associated with the problem even for current high-performing representations.

pdf
BLM-AgrF: A New French Benchmark to Investigate Generalization of Agreement in Neural Networks
Aixiu An | Chunyang Jiang | Maria A. Rodriguez | Vivi Nastase | Paola Merlo
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Successful machine learning systems currently rely on massive amounts of data, which are very effective in hiding some of the shallowness of the learned models. To help train models with more complex and compositional skills, we need challenging data, on which a system is successful only if it detects structure and regularities, that will allow it to generalize. In this paper, we describe a French dataset (BLM-AgrF) for learning the underlying rules of subject-verb agreement in sentences, developed in the BLM framework, a new task inspired by visual IQ tests known as Raven’s Progressive Matrices. In this task, an instance consists of sequences of sentences with specific attributes. To predict the correct answer as the next element of the sequence, a model must correctly detect the generative model used to produce the dataset. We provide details and share a dataset built following this methodology. Two exploratory baselines based on commonly used architectures show that despite the simplicity of the phenomenon, it is a complex problem for deep learning systems.

pdf
Grammatical information in BERT sentence embeddings as two-dimensional arrays
Vivi Nastase | Paola Merlo
Proceedings of the 8th Workshop on Representation Learning for NLP (RepL4NLP 2023)

Sentence embeddings induced with various transformer architectures encode much semantic and syntactic information in a distributed manner in a one-dimensional array. We investigate whether specific grammatical information can be accessed in these distributed representations. Using data from a task developed to test rule-like generalizations, our experiments on detecting subject-verb agreement yield several promising results. First, we show that while the usual sentence representations encoded as one-dimensional arrays do not easily support extraction of rule-like regularities, a two-dimensional reshaping of these vectors allows various learning architectures to access such information. Next, we show that various architectures can detect patterns in these two-dimensional reshaped sentence embeddings and successfully learn a model based on smaller amounts of simpler training data, which performs well on more complex test data. This indicates that current sentence embeddings contain information that is regularly distributed, and which can be captured when the embeddings are reshaped into higher dimensional arrays. Our results cast light on representations produced by language models and help move towards developing few-shot learning approaches.

pdf
Blackbird Language Matrices Tasks for Generalization
Paola Merlo | Chunyang Jiang | Giuseppe Samo | Vivi Nastase
Proceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP

To develop a system with near-human language capabilities, we need to understand current systems’ generalisation and compositional abilities. We approach this by generating compositional, structured data, inspired from visual intelligence tests, that depend on the problem-solvers being able to disentangle objects and their absolute and relative properties in a sequence of images. We design an analogous task and develop the corresponding datasets that capture specific linguistic phenomena and their properties. Solving each problem instance depends on detecting the relevant linguistic objects and generative rules of the problem. We propose two datasets modelling two linguistic phenomena – subject-verb agreement in French, and verb alternations in English. The datasets can be used to investigate how LLMs encode linguistic objects, such as phrases, their grammatical and semantic properties, such as number or semantic role, and how such information is combined to correctly solve each problem. Specifically generated error types help investigate the behaviour of the system, which important information it is able to detect, and which structures mislead it.

2021

pdf
Multi-Adversarial Learning for Cross-Lingual Word Embeddings
Haozhou Wang | James Henderson | Paola Merlo
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Generative adversarial networks (GANs) have succeeded in inducing cross-lingual word embeddings - maps of matching words across languages - without supervision. Despite these successes, GANs’ performance for the difficult case of distant languages is still not satisfactory. These limitations have been explained by GANs’ incorrect assumption that source and target embedding spaces are related by a single linear mapping and are approximately isomorphic. We assume instead that, especially across distant languages, the mapping is only piece-wise linear, and propose a multi-adversarial learning method. This novel method induces the seed cross-lingual dictionary through multiple mappings, each induced to fit the mapping for one subspace. Our experiments on unsupervised bilingual lexicon induction and cross-lingual document classification show that this method improves performance over previous single-mapping methods, especially for distant languages.

pdf bib
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
Paola Merlo | Jorg Tiedemann | Reut Tsarfaty
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

2020

pdf
Word associations and the distance properties of context-aware word embeddings
Maria A. Rodriguez | Paola Merlo
Proceedings of the 24th Conference on Computational Natural Language Learning

What do people know when they know the meaning of words? Word associations have been widely used to tap into lexical repre- sentations and their structure, as a way of probing semantic knowledge in humans. We investigate whether current word embedding spaces (contextualized and uncontextualized) can be considered good models of human lexi- cal knowledge by studying whether they have comparable characteristics to human associa- tion spaces. We study the three properties of association rank, asymmetry of similarity and triangle inequality. We find that word embeddings are good mod- els of some word associations properties. They replicate well human associations between words, and, like humans, their context-aware variants show violations of the triangle in- equality. While they do show asymmetry of similarities, their asymmetries do not map those of human association norms.

pdf bib
Syntactic Parsing in Humans and Machines
Paola Merlo
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies

To process the syntactic structures of a language in ways that are compatible with human expectations, we need computational representations of lexical and syntactic properties that form the basis of human knowledge of words and sentences. Recent neural-network-based and distributed semantics techniques have developed systems of considerable practical success and impressive performance. As has been advocated by many, however, such systems still lack human-like properties. In particular, linguistic, psycholinguistic and neuroscientific investigations have shown that human processing of sentences is sensitive to structure and unbounded relations. In the spirit of better understanding the structure building and long-distance properties of neural networks, I will present an overview of recent results on agreement and island effects in syntax in several languages. While certain sets of results in the literature indicate that neural language models exhibit long-distance agreement abilities, other finer-grained investigation of how these effects are calculated indicates that that the similarity spaces they define do not correlate with human experimental results on intervention similarity in long-distance dependencies. This opens the way to reflections on how to better match the syntactic properties of natural languages in the representations of neural models.

2019

pdf
Cross-Lingual Word Embeddings and the Structure of the Human Bilingual Lexicon
Paola Merlo | Maria Andueza Rodriguez
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Research on the bilingual lexicon has uncovered fascinating interactions between the lexicons of the native language and of the second language in bilingual speakers. In particular, it has been found that the lexicon of the underlying native language affects the organisation of the second language. In the spirit of interpreting current distributed representations, this paper investigates two models of cross-lingual word embeddings, comparing them to the shared-translation effect and the cross-lingual coactivation effects of false and true friends (cognates) found in humans. We find that the similarity structure of the cross-lingual word embeddings space yields the same effects as the human bilingual lexicon.

pdf
Weakly-Supervised Concept-based Adversarial Learning for Cross-lingual Word Embeddings
Haozhou Wang | James Henderson | Paola Merlo
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Distributed representations of words which map each word to a continuous vector have proven useful in capturing important linguistic information not only in a single language but also across different languages. Current unsupervised adversarial approaches show that it is possible to build a mapping matrix that aligns two sets of monolingual word embeddings without high quality parallel data, such as a dictionary or a sentence-aligned corpus. However, without an additional step of refinement, the preliminary mapping learnt by these methods is unsatisfactory, leading to poor performance for typologically distant languages. In this paper, we propose a weakly-supervised adversarial training method to overcome this limitation, based on the intuition that mapping across languages is better done at the concept level than at the word level. We propose a concept-based adversarial training method which improves the performance of previous unsupervised adversarial methods for most languages, and especially for typologically distant language pairs.

pdf
Probing Word and Sentence Embeddings for Long-distance Dependencies Effects in French and English
Paola Merlo
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

The recent wide-spread and strong interest in RNNs has spurred detailed investigations of the distributed representations they generate and specifically if they exhibit properties similar to those characterising human languages. Results are at present inconclusive. In this paper, we extend previous work on long-distance dependencies in three ways. We manipulate word embeddings to translate them in a space that is attuned to the linguistic properties under study. We extend the work to sentence embeddings and to new languages. We confirm previous negative results: word embeddings and sentence embeddings do not unequivocally encode fine-grained linguistic properties of long-distance dependencies.

pdf bib
SyntaxFest 2019 Invited talk - Quantitative Computational Syntax: dependencies, intervention effects and word embeddings
Paola Merlo
Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019)

pdf
Intervention effects in object relatives in English and Italian: a study in quantitative computational syntax
Giuseppe Samo | Paola Merlo
Proceedings of the First Workshop on Quantitative Syntax (Quasy, SyntaxFest 2019)

2018

pdf
Vectorial Semantic Spaces Do Not Encode Human Judgments of Intervention Similarity
Paola Merlo | Francesco Ackermann
Proceedings of the 22nd Conference on Computational Natural Language Learning

Despite their practical success and impressive performances, neural-network-based and distributed semantics techniques have often been criticized as they remain fundamentally opaque and difficult to interpret. In a vein similar to recent pieces of work investigating the linguistic abilities of these representations, we study another core, defining property of language: the property of long-distance dependencies. Human languages exhibit the ability to interpret discontinuous elements distant from each other in the string as if they were adjacent. This ability is blocked if a similar, but extraneous, element intervenes between the discontinuous components. We present results that show, under exhaustive and precise conditions, that one kind of word embeddings and the similarity spaces they define do not encode the properties of intervention similarity in long-distance dependencies, and that therefore they fail to represent this core linguistic notion.

pdf
Festina Lente: A Farewell from the Editor
Paola Merlo
Computational Linguistics, Volume 44, Issue 2 - June 2018

2017

pdf
CLCL (Geneva) DINN Parser: a Neural Network Dependency Parser Ten Years Later
Christophe Moor | Paola Merlo | James Henderson | Haozhou Wang
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

This paper describes the University of Geneva’s submission to the CoNLL 2017 shared task Multilingual Parsing from Raw Text to Universal Dependencies (listed as the CLCL (Geneva) entry). Our submitted parsing system is the grandchild of the first transition-based neural network dependency parser, which was the University of Geneva’s entry in the CoNLL 2007 multilingual dependency parsing shared task, with some improvements to speed and portability. These results provide a baseline for investigating how far we have come in the past ten years of work on neural network dependency parsing.

2016

pdf
Multi-lingual Dependency Parsing Evaluation: a Large-scale Analysis of Word Order Properties using Artificial Data
Kristina Gulordava | Paola Merlo
Transactions of the Association for Computational Linguistics, Volume 4

The growing work in multi-lingual parsing faces the challenge of fair comparative evaluation and performance analysis across languages and their treebanks. The difficulty lies in teasing apart the properties of treebanks, such as their size or average sentence length, from those of the annotation scheme, and from the linguistic properties of languages. We propose a method to evaluate the effects of word order of a language on dependency parsing performance, while controlling for confounding treebank properties. The method uses artificially-generated treebanks that are minimal permutations of actual treebanks with respect to two word order properties: word order variation and dependency lengths. Based on these artificial data on twelve languages, we show that longer dependencies and higher word order variability degrade parsing performance. Our method also extends to minimal pairs of individual sentences, leading to a finer-grained understanding of parsing errors.

pdf
Modifications of Machine Translation Evaluation Metrics by Using Word Embeddings
Haozhou Wang | Paola Merlo
Proceedings of the Sixth Workshop on Hybrid Approaches to Translation (HyTra6)

Traditional machine translation evaluation metrics such as BLEU and WER have been widely used, but these metrics have poor correlations with human judgements because they badly represent word similarity and impose strict identity matching. In this paper, we propose some modifications to the traditional measures based on word embeddings for these two metrics. The evaluation results show that our modifications significantly improve their correlation with human judgements.

pdf
Obituary: In Memoriam: Susan Armstrong
Pierrette Bouillon | Paola Merlo | Gertjan van Noord | Mike Rosner
Computational Linguistics, Volume 42, Issue 2 - June 2016

2015

pdf
Structural and lexical factors in adjective placement in complex noun phrases across Romance languages
Kristina Gulordava | Paola Merlo
Proceedings of the Nineteenth Conference on Computational Natural Language Learning

pdf
Diachronic Trends in Word Order Freedom and Dependency Length in Dependency-Annotated Corpora of Latin and Ancient Greek
Kristina Gulordava | Paola Merlo
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)

pdf
Evaluation of Two-level Dependency Representations of Argument Structure in Long-Distance Dependencies
Paola Merlo
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)

pdf
Dependency length minimisation effects in short spans: a large-scale analysis of adjective placement in complex noun phrases
Kristina Gulordava | Paola Merlo | Benoit Crabbé
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

pdf
Likelihood of External Causation in the Structure of Events
Tanja Samardžić | Paola Merlo
Proceedings of the EACL 2014 Workshop on Computational Approaches to Causality in Language (CAtoCL)

2013

pdf
Multilingual Joint Parsing of Syntactic and Semantic Dependencies with a Latent Variable Model
James Henderson | Paola Merlo | Ivan Titov | Gabriele Musillo
Computational Linguistics, Volume 39, Issue 4 - December 2013

2011

pdf
Scaling up Automatic Cross-Lingual Semantic Role Annotation
Lonneke van der Plas | Paola Merlo | James Henderson
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf
Cross-Lingual Validity of PropBank in the Manual Annotation of French
Lonneke van der Plas | Tanja Samardžić | Paola Merlo
Proceedings of the Fourth Linguistic Annotation Workshop

pdf
Cross-Lingual Variation of Light Verb Constructions: Using Parallel Corpora and Automatic Alignment for Linguistic Research
Tanja Samardžić | Paola Merlo
Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground

2009

pdf
Domain Adaptation with Artificial Data for Semantic Parsing of Speech
Lonneke van der Plas | James Henderson | Paola Merlo
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf
A Latent Variable Model of Synchronous Syntactic-Semantic Parsing for Multiple Languages
Andrea Gesmundo | James Henderson | Paola Merlo | Ivan Titov
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task

pdf
Abstraction and Generalisation in Semantic Role Labels: PropBank, VerbNet or both?
Paola Merlo | Lonneke Van Der Plas
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

2008

pdf bib
Semantic Parsing for High-Precision Semantic Role Labelling
Paola Merlo | Gabriele Musillo
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning

pdf
A Latent Variable Model of Synchronous Parsing for Syntactic and Semantic Dependencies
James Henderson | Paola Merlo | Gabriele Musillo | Ivan Titov
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning

pdf
Unlexicalised Hidden Variable Models of Split Dependency Grammars
Gabriele Antonio Musillo | Paola Merlo
Proceedings of ACL-08: HLT, Short Papers

2007

pdf bib
Proceedings of the Tenth International Conference on Parsing Technologies
Harry Bunt | Paola Merlo
Proceedings of the Tenth International Conference on Parsing Technologies

2006

pdf bib
The Notion of Argument in Prepositional Phrase Attachment
Paola Merlo | Eva Esteve Ferrer
Computational Linguistics, Volume 32, Number 3, September 2006

pdf
Accurate Parsing of the Proposition Bank
Gabriele Musillo | Paola Merlo
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

pdf
Robust Parsing of the Proposition Bank
Gabriele Musillo | Paola Merlo
Proceedings of the Workshop on ROMAND 2006:Robust Methods in Analysis of Natural language Data

2005

pdf
Lexical and Structural Biases for Function Parsing
Gabriele Musillo | Paola Merlo
Proceedings of the Ninth International Workshop on Parsing Technology

pdf
Accurate Function Parsing
Paola Merlo | Gabriele Musillo
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

2003

pdf
Generalised PP-attachment Disambiguation Using Corpus-based Linguistic Diagnostics
Paola Merlo
10th Conference of the European Chapter of the Association for Computational Linguistics

2002

pdf
Using Syntactic Analysis to Increase Efficiency in Visualizing Text Collections
James Henderson | Paola Merlo | Ivan Petroff | Gerold Schneider
COLING 2002: The 19th International Conference on Computational Linguistics

pdf
Crosslinguistic Transfer in Automatic Verb Classification
Vivian Tsang | Suzanne Stevenson | Paola Merlo
COLING 2002: The 19th International Conference on Computational Linguistics

pdf
A Multilingual Paradigm for Automatic Verb Classification
Paola Merlo | Suzanne Stevenson | Vivian Tsang | Gianluca Allaria
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

2001

pdf
Automatic distinction of arguments and modifiers: the case of prepositional phrases
Paola Merlo | Matthias Leybold
Proceedings of the ACL 2001 Workshop on Computational Natural Language Learning (ConLL)

pdf
Automatic Verb Classification Based on Statistical Distributions of Argument Structure
Paola Merlo | Suzanne Stevenson
Computational Linguistics, Volume 27, Number 3, September 2001

2000

pdf
Establishing the Upper Bound and Inter-judge Agreement of a Verb Classification Task
Paola Merlo | Suzanne Stevenson
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf
Automatic Lexical Acquisition Based on Statistical Distributions
Suzanne Stevenson | Paola Merlo
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics

1999

pdf
Supervised Learning of Lexical Semantic Verb Classes Using Frequency Distributions
Suzanne Stevenson | Paola Merlo | Natalia Kariaeva Rutgers
SIGLEX99: Standardizing Lexical Resources

pdf
Automatic Verb Classification Using Distributions of Grammatical Features
Suzanne Stevenson | Paola Merlo
Ninth Conference of the European Chapter of the Association for Computational Linguistics

1998

pdf
What grammars tell us about corpora: the case of reduced relative clauses
Paola Merlo | Suzanne Stevenson
Sixth Workshop on Very Large Corpora

1997

pdf
Attaching Multiple Prepositional Phrases: Backed-off Estimation Generalized
Paola Merlo
Second Conference on Empirical Methods in Natural Language Processing

1995

pdf
Modularity and Information Content Classes in Principle-Based Parsing
Paola Merlo
Computational Linguistics, Volume 21, Number 4, December 1995

1993

pdf
A Principle-based Parser for Foreign Language Training in German and Arabic
Joe Garman | Jeffery Martin | Paola Merlo | Amy Weinberg
Proceedings of the Third International Workshop on Parsing Technologies

In this paper we discuss the design and implementation of a parser for German and Arabic, which is currently being used in a tutoring system for foreign language training. Computer-aided language tutoring is a good application for testing the robustness and flexibility of a parsing system, since the input is usually ungrammatical in some way. Efficiency is also a concern, as tutoring applications typically run on personal computers, with the parser sharing memory with other components of the system. Our system is principle-based, which ensures a compact representation, and improves portability, needed in order to extend the initial design from German to Arabic and (eventually) Spanish. Currently, the parser diagnoses agreement errors, case errors, selection errors, and some word order errors. The parser can handle simple and complex declaratives and questions, topicalisations, verb movement, relative clauses — broad enough coverage to be useful in the design of real exercises and dialogues.

1992

pdf
An LR Category-Neutral Parser With Left Corner Prediction
Paola Merlo
30th Annual Meeting of the Association for Computational Linguistics