Alessandra Zarcone


GiCCS: A German in-Context Conversational Similarity Benchmark
Shima Asaadi | Zahra Kolagar | Alina Liebel | Alessandra Zarcone
Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)

The Semantic textual similarity (STS) task is commonly used to evaluate the semantic representations that language models (LMs) learn from texts, under the assumption that good-quality representations will yield accurate similarity estimates. When it comes to estimating the similarity of two utterances in a dialogue, however, the conversational context plays a particularly important role. We argue for the need of benchmarks specifically created using conversational data in order to evaluate conversational LMs in the STS task. We introduce GiCCS, a first conversational STS evaluation benchmark for German. We collected the similarity annotations for GiCCS using best-worst scaling and presenting the target items in context, in order to obtain highly-reliable context-dependent similarity scores. We present benchmarking experiments for evaluating LMs on capturing the similarity of utterances. Results suggest that pretraining LMs on conversational data and providing conversational context can be useful for capturing similarity of utterances in dialogues. GiCCS will be publicly available to encourage benchmarking of conversational LMs.


Not So Fast, Classifier – Accuracy and Entropy Reduction in Incremental Intent Classification
Lianna Hrycyk | Alessandra Zarcone | Luzian Hahn
Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI

Incremental intent classification requires the assignment of intent labels to partial utterances. However, partial utterances do not necessarily contain enough information to be mapped to the intent class of their complete utterance (correctly and with a certain degree of confidence). Using the final interpretation as the ground truth to measure a classifier’s accuracy during intent classification of partial utterances is thus problematic. We release inCLINC, a dataset of partial and full utterances with human annotations of plausible intent labels for different portions of each utterance, as an upper (human) baseline for incremental intent classification. We analyse the incremental annotations and propose entropy reduction as a measure of human annotators’ convergence on an interpretation (i.e. intent label). We argue that, when the annotators do not converge to one or a few possible interpretations and yet the classifier already identifies the final intent class early on, it is a sign of overfitting that can be ascribed to artefacts in the dataset.

New Domain, Major Effort? How Much Data is Necessary to Adapt a Temporal Tagger to the Voice Assistant Domain
Touhidul Alam | Alessandra Zarcone | Sebastian Padó
Proceedings of the 14th International Conference on Computational Semantics (IWCS)

Reliable tagging of Temporal Expressions (TEs, e.g., Book a table at L’Osteria for Sunday evening) is a central requirement for Voice Assistants (VAs). However, there is a dearth of resources and systems for the VA domain, since publicly-available temporal taggers are trained only on substantially different domains, such as news and clinical text. Since the cost of annotating large datasets is prohibitive, we investigate the trade-off between in-domain data and performance in DA-Time, a hybrid temporal tagger for the English VA domain which combines a neural architecture for robust TE recognition, with a parser-based TE normalizer. We find that transfer learning goes a long way even with as little as 25 in-domain sentences: DA-Time performs at the state of the art on the news domain, and substantially outperforms it on the VA domain.


PATE: A Corpus of Temporal Expressions for the In-car Voice Assistant Domain
Alessandra Zarcone | Touhidul Alam | Zahra Kolagar
Proceedings of the Twelfth Language Resources and Evaluation Conference

The recognition and automatic annotation of temporal expressions (e.g. “Add an event for tomorrow evening at eight to my calendar”) is a key module for AI voice assistants, in order to allow them to interact with apps (for example, a calendar app). However, in the NLP literature, research on temporal expressions has focused mostly on data from the news, from the clinical domain, and from social media. The voice assistant domain is very different than the typical domains that have been the focus of work on temporal expression identification, thus requiring a dedicated data collection. We present a crowdsourcing method for eliciting natural-language commands containing temporal expressions for an AI voice assistant, by using pictures and scenario descriptions. We annotated the elicited commands (480) as well as the commands in the Snips dataset following the TimeML/TIMEX3 annotation guidelines, reaching a total of 1188 annotated commands. The commands can be later used to train the NLU components of an AI voice assistant.


pdf bib
Inducing Script Structure from Crowdsourced Event Descriptions via Semi-Supervised Clustering
Lilian Wanzare | Alessandra Zarcone | Stefan Thater | Manfred Pinkal
Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics

We present a semi-supervised clustering approach to induce script structure from crowdsourced descriptions of event sequences by grouping event descriptions into paraphrase sets (representing event types) and inducing their temporal order. Our approach exploits semantic and positional similarity and allows for flexible event order, thus overcoming the rigidity of previous approaches. We incorporate crowdsourced alignments as prior knowledge and show that exploiting a small number of alignments results in a substantial improvement in cluster quality over state-of-the-art models and provides an appropriate basis for the induction of temporal order. We also show a coverage study to demonstrate the scalability of our approach.


A Crowdsourced Database of Event Sequence Descriptions for the Acquisition of High-quality Script Knowledge
Lilian D. A. Wanzare | Alessandra Zarcone | Stefan Thater | Manfred Pinkal
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Scripts are standardized event sequences describing typical everyday activities, which play an important role in the computational modeling of cognitive abilities (in particular for natural language processing). We present a large-scale crowdsourced collection of explicit linguistic descriptions of script-specific event sequences (40 scenarios with 100 sequences each). The corpus is enriched with crowdsourced alignment annotation on a subset of the event descriptions, to be used in future work as seed data for automatic alignment of event descriptions (for example via clustering). The event descriptions to be aligned were chosen among those expected to have the strongest corrective effect on the clustering algorithm. The alignment annotation was evaluated against a gold standard of expert annotators. The resulting database of partially-aligned script-event descriptions provides a sound empirical basis for inducing high-quality script knowledge, as well as for any task involving alignment and paraphrase detection of events.


Fitting, Not Clashing! A Distributional Semantic Model of Logical Metonymy
Alessandra Zarcone | Alessandro Lenci | Sebastian Padó | Jason Utt
Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Short Papers

The Curious Case of Metonymic Verbs: A Distributional Characterization
Jason Utt | Alessandro Lenci | Sebastian Padó | Alessandra Zarcone
Proceedings of the IWCS 2013 Workshop Towards a Formal Distributional Semantics


Logical metonymies and qualia structures: an annotated database of logical metonymies for German
Alessandra Zarcone | Stefan Rued
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Logical metonymies like """"The author began the book"""" involve the interpretation of events that are not realized in the sentence (Covert events: -> """"writing the book""""). The Generative Lexicon (Pustejovsky 1995) provides a qualia-based account of covert event interpretation, claiming that the covert event is retrieved from the qualia structure of the object. Such a theory poses the question of to what extent covert events in logical metonymies can be accounted for by qualia structures. Building on previous work on English, we present a corpus study for German verbs (""""anfangen (mit)"""", """"aufhoeren (mit)"""", """"beenden"""", """"beginnen (mit)"""", """"geniessen"""", based on data obtained from the deWaC corpus. We built a corpus of logical metonymies, which were manually annotated and compared with the qualia structures of their objects, then we contrasted annotation results from two expert annotators for metonymies (""""The author began the book"""") and long forms (""""The author began reading the book"""") across verbs. Our annotation was evaluated on a sample of sentences annotated by a group of naive annotators on a crowdsourcing platform. The logical metonymy database (2661 metonymies and 1886 long forms) with two expert annotations is freely available for scientific research purposes.

Modeling covert event retrieval in logical metonymy: probabilistic and distributional accounts
Alessandra Zarcone | Jason Utt | Sebastian Padó
Proceedings of the 3rd Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2012)


Computational Models for Event Type Classification in Context
Alessandra Zarcone | Alessandro Lenci
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Verb lexical semantic properties are only one of the factors that contribute to the determination of the event type expressed by a sentence, which is instead the result of a complex interplay between the verb meaning and its linguistic context. We report on two computational models for the automatic identification of event type in Italian. Both models use linguistically-motivated features extracted from Italian corpora. The main goal of our experiments is to evaluate the contribution of different types of linguistic indicators to identify the event type of a sentence, as well as to model various cases of context-driven event type shift. In the first model, event type identification has been modelled as a supervised classification task, performed with Maximum Entropy classifiers. In the second model, Self-Organizing Maps have been used to define and identify event types in an unsupervised way. The interaction of various contextual factors in determining the event type expressed by a sentence makes event type identification a highly challenging task. Computational models can help us to shed new light on the real structure of event type classes as well as to gain a better understanding of context-driven semantic shifts.