Malvina Nissim

2021

We describe and make available the game-based material developed for a laboratory run at several Italian science festivals to popularize NLP among young students.

Although Natural Language Processing is at the core of many tools young people use in their everyday life, high school curricula (in Italy) do not include any computational linguistics education. This lack of exposure makes the use of such tools less responsible than it could be, and makes choosing computational linguistics as a university degree unlikely. To raise awareness, curiosity, and longer-term interest in young people, we have developed an interactive workshop designed to illustrate the basic principles of NLP and computational linguistics to high school Italian students aged between 13 and 18 years. The workshop takes the form of a game in which participants play the role of machines needing to solve some of the most common problems a computer faces in understanding language: from voice recognition to Markov chains to syntactic parsing. Participants are guided through the workshop with the help of instructors, who present the activities and explain core concepts from computational linguistics. The workshop was presented at numerous outlets in Italy between 2019 and 2020, both face-to-face and online.

pdf bib
As Good as New. How to Successfully Recycle English GPT-2 to Make Models for Other Languages
Wietse de Vries | Malvina Nissim
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Adapting Monolingual Models: Data can be Scarce when Language Similarity is High
Wietse de Vries | Martijn Bartelds | Malvina Nissim | Martijn Wieling
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib abs
Breeding Fillmore’s Chickens and Hatching the Eggs: Recombining Frames and Roles in Frame-Semantic Parsing
Gosse Minnema | Malvina Nissim
Proceedings of the 14th International Conference on Computational Semantics (IWCS)

Frame-semantic parsers traditionally predict predicates, frames, and semantic roles in a fixed order. This paper explores the ‘chicken-or-egg’ problem of interdependencies between these components theoretically and practically. We introduce a flexible BERT-based sequence labeling architecture that allows for predicting frames and roles independently from each other or combining them in several ways. Our results show that our setups can approximate more complex traditional models’ performance, while allowing for a clearer view of the interdependencies between the pipeline’s components, and of how frame and role prediction models make different use of BERT’s layers.

As socially unacceptable language become pervasive in social media platforms, the need for automatic content moderation become more pressing. This contribution introduces the Dutch Abusive Language Corpus (DALC v1.0), a new dataset with tweets manually an- notated for abusive language. The resource ad- dress a gap in language resources for Dutch and adopts a multi-layer annotation scheme modeling the explicitness and the target of the abusive messages. Baselines experiments on all annotation layers have been conducted, achieving a macro F1 score of 0.748 for binary classification of the explicitness layer and .489 for target classification.

pdf bib abs
Thank you BART! Rewarding Pre-Trained Models Improves Formality Style Transfer
Huiyuan Lai | Antonio Toral | Malvina Nissim
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Scarcity of parallel data causes formality style transfer models to have scarce success in preserving content. We show that fine-tuning pre-trained language (GPT-2) and sequence-to-sequence (BART) models boosts content preservation, and that this is possible even with limited amounts of parallel data. Augmenting these models with rewards that target style and content –the two core aspects of the task– we achieve a new state-of-the-art.

pdf bib abs
Human Perception in Natural Language Generation
Lorenzo De Mattei | Huiyuan Lai | Felice Dell’Orletta | Malvina Nissim
Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021)

We ask subjects whether they perceive as human-produced a bunch of texts, some of which are actually human-written, while others are automatically generated. We use this data to fine-tune a GPT-2 model to push it to generate more human-like texts, and observe that this fine-tuned model produces texts that are indeed perceived more human-like than the original model. Contextually, we show that our automatic evaluation strategy well correlates with human judgements. We also run a linguistic analysis to unveil the characteristics of human- vs machine-perceived language.

pdf bib abs
Generic resources are what you need: Style transfer tasks without task-specific parallel training data
Huiyuan Lai | Antonio Toral | Malvina Nissim
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Style transfer aims to rewrite a source text in a different target style while preserving its content. We propose a novel approach to this task that leverages generic resources, and without using any task-specific parallel (source–target) data outperforms existing unsupervised approaches on the two most popular style transfer tasks: formality transfer and polarity swap. In practice, we adopt a multi-step procedure which builds on a generic pre-trained sequence-to-sequence model (BART). First, we strengthen the model’s ability to rewrite by further pre-training BART on both an existing collection of generic paraphrases, as well as on synthetic pairs created using a general-purpose lexical resource. Second, through an iterative back-translation approach, we train two models, each in a transfer direction, so that they can provide each other with synthetically generated pairs, dynamically in the training process. Lastly, we let our best resulting model generate static synthetic pairs to be used in a supervised training regime. Besides methodology and state-of-the-art results, a core contribution of this work is a reflection on the nature of the two tasks we address, and how their differences are highlighted by their response to our approach.

2020

pdf bib abs
Fair Is Better than Sensational: Man Is to Doctor as Woman Is to Doctor
Malvina Nissim | Rik van Noord | Rob van der Goot
Computational Linguistics, Volume 46, Issue 2 - June 2020

Analogies such as man is to king as woman is to X are often used to illustrate the amazing power of word embeddings. Concurrently, they have also been used to expose how strongly human biases are encoded in vector spaces trained on natural language, with examples like man is to computer programmer as woman is to homemaker. Recent work has shown that analogies are in fact not an accurate diagnostic for bias, but this does not mean that they are not used anymore, or that their legacy is fading. Instead of focusing on the intrinsic problems of the analogy task as a bias detection tool, we discuss a series of issues involving implementation as well as subjective choices that might have yielded a distorted picture of bias in word embeddings. We stand by the truth that human biases are present in word embeddings, and, of course, the need to address them. But analogies are not an accurate tool to do so, and the way they have been most often used has exacerbated some possibly non-existing biases and perhaps hidden others. Because they are still widely popular, and some of them have become classics within and outside the NLP community, we deem it important to provide a series of clarifications that should put well-known, and potentially new analogies, into the right perspective.

pdf bib abs
What’s so special about BERT’s layers? A closer look at the NLP pipeline in monolingual and multilingual models
Wietse de Vries | Andreas van Cranenburgh | Malvina Nissim
Findings of the Association for Computational Linguistics: EMNLP 2020

Peeking into the inner workings of BERT has shown that its layers resemble the classical NLP pipeline, with progressively more complex tasks being concentrated in later layers. To investigate to what extent these results also hold for a language other than English, we probe a Dutch BERT-based model and the multilingual BERT model for Dutch NLP tasks. In addition, through a deeper analysis of part-of-speech tagging, we show that also within a given task, information is spread over different parts of the network and the pipeline might not be as neat as it seems. Each layer has different specialisations, so that it may be more useful to combine information from different layers, instead of selecting a single one based on the best overall performance.

pdf bib abs
On the interaction of automatic evaluation and task framing in headline style transfer
Lorenzo De Mattei | Michele Cafagna | Huiyuan Lai | Felice Dell’Orletta | Malvina Nissim | Albert Gatt
Proceedings of the 1st Workshop on Evaluating NLG Evaluation

An ongoing debate in the NLG community concerns the best way to evaluate systems, with human evaluation often being considered the most reliable method, compared to corpus-based metrics. However, tasks involving subtle textual differences, such as style transfer, tend to be hard for humans to perform. In this paper, we propose an evaluation method for this task based on purposely-trained classifiers, showing that it better reflects system differences than traditional metrics such as BLEU.

pdf bib abs
Unmasking Contextual Stereotypes: Measuring and Mitigating BERT’s Gender Bias
Marion Bartl | Malvina Nissim | Albert Gatt
Proceedings of the Second Workshop on Gender Bias in Natural Language Processing

Contextualized word embeddings have been replacing standard embeddings as the representational knowledge source of choice in NLP systems. Since a variety of biases have previously been found in standard word embeddings, it is crucial to assess biases encoded in their replacements as well. Focusing on BERT (Devlin et al., 2018), we measure gender bias by studying associations between gender-denoting target words and names of professions in English and German, comparing the findings with real-world workforce statistics. We mitigate bias by fine-tuning BERT on the GAP corpus (Webster et al., 2018), after applying Counterfactual Data Substitution (CDS) (Maudslay et al., 2019). We show that our method of measuring bias is appropriate for languages such as English, but not for languages with a rich morphology and gender-marking, such as German. Our results highlight the importance of investigating bias and mitigation techniques cross-linguistically,especially in view of the current emphasis on large-scale, multilingual language models.

pdf bib abs
Lower Bias, Higher Density Abusive Language Datasets: A Recipe
Juliet van Rosendaal | Tommaso Caselli | Malvina Nissim
Proceedings of the Workshop on Resources and Techniques for User and Author Profiling in Abusive Language

Datasets to train models for abusive language detection are at the same time necessary and still scarce. One the reasons for their limited availability is the cost of their creation. It is not only that manual annotation is expensive, it is also the case that the phenomenon is sparse, causing human annotators having to go through a large number of irrelevant examples in order to obtain some significant data. Strategies used until now to increase density of abusive language and obtain more meaningful data overall, include data filtering on the basis of pre-selected keywords and hate-rich sources of data. We suggest a recipe that at the same time can provide meaningful data with possibly higher density of abusive language and also reduce top-down biases imposed by corpus creators in the selection of the data to annotate. More specifically, we exploit the controversy channel on Reddit to obtain keywords that are used to filter a Twitter dataset. While the method needs further validation and refinement, our preliminary experiments show a higher density of abusive tweets in the filtered vs unfiltered dataset, and a more meaningful topic distribution after filtering.

pdf bib
Proceedings of the Third Workshop on Computational Modeling of People's Opinions, Personality, and Emotion's in Social Media
Malvina Nissim | Viviana Patti | Barbara Plank | Esin Durmus
Proceedings of the Third Workshop on Computational Modeling of People's Opinions, Personality, and Emotion's in Social Media

pdf bib abs
Matching Theory and Data with Personal-ITY: What a Corpus of Italian YouTube Comments Reveals About Personality
Elisa Bassignana | Malvina Nissim | Viviana Patti
Proceedings of the Third Workshop on Computational Modeling of People's Opinions, Personality, and Emotion's in Social Media

As a contribution to personality detection in languages other than English, we rely on distant supervision to create Personal-ITY, a novel corpus of YouTube comments in Italian, where authors are labelled with personality traits. The traits are derived from one of the mainstream personality theories in psychology research, named MBTI. Using personality prediction experiments, we (i) study the task of personality prediction in itself on our corpus as well as on TWISTY, a Twitter dataset also annotated with MBTI labels; (ii) carry out an extensive, in-depth analysis of the features used by the classifier, and view them specifically under the light of the original theory that we used to create the corpus in the first place. We observe that no single model is best at personality detection, and that while some traits are easier than others to detect, and also to match back to theory, for other, less frequent traits the picture is much more blurred.

pdf bib abs
MAGPIE: A Large Corpus of Potentially Idiomatic Expressions
Hessel Haagsma | Johan Bos | Malvina Nissim
Proceedings of the 12th Language Resources and Evaluation Conference

Given the limited size of existing idiom corpora, we aim to enable progress in automatic idiom processing and linguistic analysis by creating the largest-to-date corpus of idioms for English. Using a fixed idiom list, automatic pre-extraction, and a strictly controlled crowdsourced annotation procedure, we show that it is feasible to build a high-quality corpus comprising more than 50K instances, an order of a magnitude larger than previous resources. Crucial ingredients of crowdsourcing were the selection of crowdworkers, clear and comprehensive instructions, and an interface that breaks down the task in small, manageable steps. Analysis of the resulting corpus revealed strong effects of genre on idiom distribution, providing new evidence for existing theories on what influences idiom usage. The corpus also contains rich metadata, and is made publicly available.

pdf bib abs
Invisible to People but not to Machines: Evaluation of Style-aware HeadlineGeneration in Absence of Reliable Human Judgment
Lorenzo De Mattei | Michele Cafagna | Felice Dell’Orletta | Malvina Nissim
Proceedings of the 12th Language Resources and Evaluation Conference

We automatically generate headlines that are expected to comply with the specific styles of two different Italian newspapers. Through a data alignment strategy and different training/testing settings, we aim at decoupling content from style and preserve the latter in generation. In order to evaluate the generated headlines’ quality in terms of their specific newspaper-compliance, we devise a fine-grained evaluation strategy based on automatic classification. We observe that our models do indeed learn newspaper-specific style. Importantly, we also observe that humans aren’t reliable judges for this task, since although familiar with the newspapers, they are not able to discern their specific styles even in the original human-written headlines. The utility of automatic evaluation goes therefore beyond saving the costs and hurdles of manual annotation, and deserves particular care in its design.

2019

pdf bib abs
You Write like You Eat: Stylistic Variation as a Predictor of Social Stratification
Angelo Basile | Albert Gatt | Malvina Nissim
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Inspired by Labov’s seminal work on stylisticvariation as a function of social stratification,we develop and compare neural models thatpredict a person’s presumed socio-economicstatus, obtained through distant supervision,from their writing style on social media. Thefocus of our work is on identifying the mostimportant stylistic parameters to predict socio-economic group. In particular, we show theeffectiveness of morpho-syntactic features aspredictors of style, in contrast to lexical fea-tures, which are good predictors of topic

2018

pdf bib abs
Bleaching Text: Abstract Features for Cross-lingual Gender Prediction
Rob van der Goot | Nikola Ljubešić | Ian Matroos | Malvina Nissim | Barbara Plank
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Gender prediction has typically focused on lexical and social network features, yielding good performance, but making systems highly language-, topic-, and platform dependent. Cross-lingual embeddings circumvent some of these limitations, but capture gender-specific style less. We propose an alternative: bleaching text, i.e., transforming lexical strings into more abstract features. This study provides evidence that such features allow for better transfer across languages. Moreover, we present a first study on the ability of humans to perform cross-lingual gender prediction. We find that human predictive power proves similar to that of our bleached models, and both perform better than lexical models.

pdf bib abs
Discriminator at SemEval-2018 Task 10: Minimally Supervised Discrimination
Artur Kulmizev | Mostafa Abdou | Vinit Ravishankar | Malvina Nissim
Proceedings of The 12th International Workshop on Semantic Evaluation

We participated to the SemEval-2018 shared task on capturing discriminative attributes (Task 10) with a simple system that ranked 8th amongst the 26 teams that took part in the evaluation. Our final score was 0.67, which is competitive with the winning score of 0.75, particularly given that our system is a zero-shot system that requires no training and minimal parameter optimisation. In addition to describing the submitted system, and discussing the implications of the relative success of such a system on this task, we also report on other, more complex models we experimented with.

pdf bib
Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics
Malvina Nissim | Jonathan Berant | Alessandro Lenci
Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics

pdf bib
Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media
Malvina Nissim | Viviana Patti | Barbara Plank | Claudia Wagner
Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media

pdf bib abs
The Other Side of the Coin: Unsupervised Disambiguation of Potentially Idiomatic Expressions by Contrasting Senses
Hessel Haagsma | Malvina Nissim | Johan Bos
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

Disambiguation of potentially idiomatic expressions involves determining the sense of a potentially idiomatic expression in a given context, e.g. determining that make hay in ‘Investment banks made hay while takeovers shone.’ is used in a figurative sense. This enables automatic interpretation of idiomatic expressions, which is important for applications like machine translation and sentiment analysis. In this work, we present an unsupervised approach for English that makes use of literalisations of idiom senses to improve disambiguation, which is based on the lexical cohesion graph-based method by Sporleder and Li (2009). Experimental results show that, while literalisation carries novel information, its performance falls short of that of state-of-the-art unsupervised methods.

2017

pdf bib abs
To normalize, or not to normalize: The impact of normalization on Part-of-Speech tagging
Rob van der Goot | Barbara Plank | Malvina Nissim
Proceedings of the 3rd Workshop on Noisy User-generated Text

Does normalization help Part-of-Speech (POS) tagging accuracy on noisy, non-canonical data? To the best of our knowledge, little is known on the actual impact of normalization in a real-world scenario, where gold error detection is not available. We investigate the effect of automatic normalization on POS tagging of tweets. We also compare normalization to strategies that leverage large amounts of unlabeled data kept in its raw form. Our results show that normalization helps, but does not add consistently beyond just word embedding layer initialization. The latter approach yields a tagging model that is competitive with a Twitter state-of-the-art tagger.

In this paper, we explore the performance of a linear SVM trained on language independent character features for the NLI Shared Task 2017. Our basic system (GRONINGEN) achieves the best performance (87.56 F1-score) on the evaluation set using only 1-9 character n-grams as features. We compare this against several ensemble and meta-classifiers in order to examine how the linear system fares when combined with other, especially non-linear classifiers. Special emphasis is placed on the topic bias that exists by virtue of the assessment essay prompt distribution.

2016

pdf bib
Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES)
Malvina Nissim | Viviana Patti | Barbara Plank
Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES)

pdf bib abs
Distant supervision for emotion detection using Facebook reactions
Chris Pool | Malvina Nissim
Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES)

We exploit the Facebook reaction feature in a distant supervised fashion to train a support vector machine classifier for emotion detection, using several feature combinations and combining different Facebook pages. We test our models on existing benchmarks for emotion detection and show that employing only information that is derived completely automatically, thus without relying on any handcrafted lexicon as it’s usually done, we can achieve competitive results. The results also show that there is large room for improvement, especially by gearing the collection of Facebook pages, with a view to the target domain.

pdf bib abs
Leveraging Native Data to Correct Preposition Errors in Learners’ Dutch
Lennart Kloppenburg | Malvina Nissim
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We address the task of automatically correcting preposition errors in learners’ Dutch by modelling preposition usage in native language. Specifically, we build two models exploiting a large corpus of Dutch. The first is a binary model for detecting whether a preposition should be used at all in a given position or not. The second is a multiclass model for selecting the appropriate preposition in case one should be used. The models are tested on native as well as learners data. For the latter we exploit a crowdsourcing strategy to elicit native judgements. On native test data the models perform very well, showing that we can model preposition usage appropriately. However, the evaluation on learners’ data shows that while detecting that a given preposition is wrong is doable reasonably well, detecting the absence of a preposition is a lot more difficult. Observing such results and the data we deal with, we envisage various ways of improving performance, and report them in the final section of this article.

2015

pdf bib
Uncovering Noun-Noun Compound Relations by Gamification
Johan Bos | Malvina Nissim
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

pdf bib
Adding Semantics to Data-Driven Paraphrasing
Ellie Pavlick | Johan Bos | Malvina Nissim | Charley Beller | Benjamin Van Durme | Chris Callison-Burch
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf bib abs
A Modular System for Rule-based Text Categorisation
Marco Del Tredici | Malvina Nissim
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We introduce a modular rule-based approach to text categorisation which is more flexible and less time consuming to build than a standard rule-based system because it works with a hierarchical structure and allows for re-usability of rules. When compared to currently more wide-spread machine learning models on a case study, our modular system shows competitive results, and it has the advantage of reducing manual effort over time, since only fewer rules must be written when moving to a (partially) new domain, while annotation of training data is always required in the same amount.

pdf bib
The Meaning Factory: Formal Semantics for Recognizing Textual Entailment and Determining Semantic Similarity
Johannes Bjerva | Johan Bos | Rob van der Goot | Malvina Nissim
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

2013

pdf bib
Cross-linguistic annotation of modality: a data-driven hierarchical model
Malvina Nissim | Paola Pietrandrea | Andrea Sansò | Caterina Mauri
Proceedings of the 9th Joint ISO - ACL SIGSEM Workshop on Interoperable Semantic Annotation

pdf bib
Modelling the Internal Variability of MWEs
Malvina Nissim
Proceedings of the 9th Workshop on Multiword Expressions

pdf bib
A Repository of Variation Patterns for Multiword Expressions
Malvina Nissim | Andrea Zaninello
Proceedings of the 9th Workshop on Multiword Expressions

pdf bib
Sentiment analysis on Italian tweets
Valerio Basile | Malvina Nissim
Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

2010

pdf bib abs
Creation of Lexical Resources for a Characterisation of Multiword Expressions in Italian
Andrea Zaninello | Malvina Nissim
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The theoretical characterisation of multiword expressions (MWEs) is tightly connected to their actual occurrences in data and to their representation in lexical resources. We present three lexical resources for Italian MWEs, namely an electronic lexicon, a series of example corpora and a database of MWEs represented around morphosyntactic patterns. These resources are matched against, and created from, a very large web-derived corpus for Italian that spans across registers and domains. We can thus test expressions coded by lexicographers in a dictionary, thereby discarding unattested expressions, revisiting lexicographers's choices on the basis of frequency information, and at the same time creating an example sub-corpus for each entry. We organise MWEs on the basis of the morphosyntactic information obtained from the data in an electronic, flexible knowledge-base containing structured annotation exploitable for multiple purposes. We also suggest further work directions towards characterising MWEs by analysing the data organised in our database through lexico-semantic information available in WordNet or MultiWordNet-like resources, also in the perspective of expanding their set through the extraction of other similar compact expressions.

2009

pdf bib
Automatic identification of semantic relations in Italian complex nominals
Fabio Celli | Malvina Nissim
Proceedings of the Eight International Conference on Computational Semantics

2008

pdf bib abs
The Italian Particle “ne”: Corpus Construction and Analysis
Malvina Nissim | Sara Perboni
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The Italian particle ne exhibits interesting anaphoric properties that have not been yet explored in depth from a corpus and computational linguistic perspective. We provide: (i) an overview of the phenomenon; (ii) a set of annotation schemes for marking up occurrences of ne; (iii) the description of a corpus annotated for this phenomenon ; (iv) a first assessment of the resolution task. We show that the schemes we developed are reliable, and that the actual distribution of partitive and non-partitive uses of ne is inversely proportional to the amount of attention that the two different uses have received in the linguistic literature. As an assessment of the complexity of the resolution task, we find that a recency-based baseline yields an accuracy of less than 30% on both development and test data.

2007

pdf bib
SemEval-2007 Task 08: Metonymy Resolution at SemEval-2007
Katja Markert | Malvina Nissim
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

2006

pdf bib abs
The Impact of Annotation on the Performance of Protein Tagging in Biomedical Text
Beatrice Alex | Malvina Nissim | Claire Grover
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we discuss five different corpora annotated forprotein names. We present several within- and cross-dataset proteintagging experiments showing that different annotation schemes severelyaffect the portability of statistical protein taggers. By means of adetailed error analysis we identify crucial annotation issues thatfuture annotation projects should take into careful consideration.

pdf bib
An Empirical Approach to the Interpretation of Superlatives
Johan Bos | Malvina Nissim
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

pdf bib
Learning Information Status of Discourse Entities
Malvina Nissim
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing