Alessandra Zarcone


2024

pdf bib
THAugs at GermEval 2024 (Shared Task 1: GerMS-Detect): Predicting the Severity of Misogyny/Sexism in Forum Comments with BERT Models (Subtask 1, Closed Track and Additional Experiments)
Corsin Geiss | Alessandra Zarcone
Proceedings of GermEval 2024 Task 1 GerMS-Detect Workshop on Sexism Detection in German Online News Fora (GerMS-Detect 2024)

pdf
Aligning Uncertainty: Leveraging LLMs to Analyze Uncertainty Transfer in Text Summarization
Zahra Kolagar | Alessandra Zarcone
Proceedings of the 1st Workshop on Uncertainty-Aware NLP (UncertaiNLP 2024)

Automatically generated summaries can be evaluated along different dimensions, one being how faithfully the uncertainty from the source text is conveyed in the summary. We present a study on uncertainty alignment in automatic summarization, starting from a two-tier lexical and semantic categorization of linguistic expression of uncertainty, which we used to annotate source texts and automatically generate summaries. We collected a diverse dataset including news articles and personal blogs and generated summaries using GPT-4. Source texts and summaries were annotated based on our two-tier taxonomy using a markup language. The automatic annotation was refined and validated by subsequent iterations based on expert input. We propose a method to evaluate the fidelity of uncertainty transfer in text summarization. The method capitalizes on a small amount of expert annotations and on the capabilities of Large language models (LLMs) to evaluate how the uncertainty of the source text aligns with the uncertainty expressions in the summary.

pdf
HumSum: A Personalized Lecture Summarization Tool for Humanities Students Using LLMs
Zahra Kolagar | Alessandra Zarcone
Proceedings of the 1st Workshop on Personalization of Generative AI Systems (PERSONALIZE 2024)

Generative AI systems aim to create customizable content for their users, with a subsequent surge in demand for adaptable tools that can create personalized experiences. This paper presents HumSum, a web-based tool tailored for humanities students to effectively summarize their lecture transcripts and to personalize the summaries to their specific needs. We first conducted a survey driven by different potential scenarios to collect user preferences to guide the implementation of this tool. Utilizing Streamlit, we crafted the user interface, while Langchain’s Map Reduce function facilitated the summarization process for extensive lectures using OpenAI’s GPT-4 model. HumSum is an intuitive tool serving various summarization needs, infusing personalization into the tool’s functionality without necessitating the collection of personal user data.

2023

pdf
EduQuick: A Dataset Toward Evaluating Summarization of Informal Educational Content for Social Media
Zahra Kolagar | Sebastian Steindl | Alessandra Zarcone
Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems

This study explores the capacity of large language models (LLMs) to efficiently generate summaries of informal educational content tailored for platforms like TikTok. It also investigates how both humans and LLMs assess the quality of these summaries, based on a series of experiments, exploring the potential replacement of human evaluation with LLMs. Furthermore, the study delves into how experienced content creators perceive the utility of automatic summaries for TikTok videos. We employ strategic prompt selection techniques to guide LLMs in producing engaging summaries based on the characteristics of viral TikTok content, including hashtags, captivating hooks, storytelling, and user engagement. The study leverages OpenAI’s GPT-4 model to generate TikTok content summaries, aiming to align them with the essential features identified. By employing this model and incorporating human evaluation and expert assessment, this research endeavors to shed light on the intricate dynamics of modern content creation, where AI and human ingenuity converge. Ultimately, it seeks to enhance strategies for disseminating and evaluating educational information effectively in the realm of social media.

pdf
Bubble up – A Fine-tuning Approach for Style Transfer to Community-specific Subreddit Language
Alessandra Zarcone | Fabian Kopf
Proceedings of the 3rd Workshop on Computational Linguistics for the Political and Social Sciences

pdf bib
Including a contemporary NLP application within an introductory course: an example with student feedback from a University of Applied Sciences
Saurabh Kumar | Alessandra Zarcone
Proceedings of the 1st Workshop on Teaching for NLP

2022

pdf
GiCCS: A German in-Context Conversational Similarity Benchmark
Shima Asaadi | Zahra Kolagar | Alina Liebel | Alessandra Zarcone
Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)

The Semantic textual similarity (STS) task is commonly used to evaluate the semantic representations that language models (LMs) learn from texts, under the assumption that good-quality representations will yield accurate similarity estimates. When it comes to estimating the similarity of two utterances in a dialogue, however, the conversational context plays a particularly important role. We argue for the need of benchmarks specifically created using conversational data in order to evaluate conversational LMs in the STS task. We introduce GiCCS, a first conversational STS evaluation benchmark for German. We collected the similarity annotations for GiCCS using best-worst scaling and presenting the target items in context, in order to obtain highly-reliable context-dependent similarity scores. We present benchmarking experiments for evaluating LMs on capturing the similarity of utterances. Results suggest that pretraining LMs on conversational data and providing conversational context can be useful for capturing similarity of utterances in dialogues. GiCCS will be publicly available to encourage benchmarking of conversational LMs.

2021

pdf
Not So Fast, Classifier – Accuracy and Entropy Reduction in Incremental Intent Classification
Lianna Hrycyk | Alessandra Zarcone | Luzian Hahn
Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI

Incremental intent classification requires the assignment of intent labels to partial utterances. However, partial utterances do not necessarily contain enough information to be mapped to the intent class of their complete utterance (correctly and with a certain degree of confidence). Using the final interpretation as the ground truth to measure a classifier’s accuracy during intent classification of partial utterances is thus problematic. We release inCLINC, a dataset of partial and full utterances with human annotations of plausible intent labels for different portions of each utterance, as an upper (human) baseline for incremental intent classification. We analyse the incremental annotations and propose entropy reduction as a measure of human annotators’ convergence on an interpretation (i.e. intent label). We argue that, when the annotators do not converge to one or a few possible interpretations and yet the classifier already identifies the final intent class early on, it is a sign of overfitting that can be ascribed to artefacts in the dataset.

pdf
New Domain, Major Effort? How Much Data is Necessary to Adapt a Temporal Tagger to the Voice Assistant Domain
Touhidul Alam | Alessandra Zarcone | Sebastian Padó
Proceedings of the 14th International Conference on Computational Semantics (IWCS)

Reliable tagging of Temporal Expressions (TEs, e.g., Book a table at L’Osteria for Sunday evening) is a central requirement for Voice Assistants (VAs). However, there is a dearth of resources and systems for the VA domain, since publicly-available temporal taggers are trained only on substantially different domains, such as news and clinical text. Since the cost of annotating large datasets is prohibitive, we investigate the trade-off between in-domain data and performance in DA-Time, a hybrid temporal tagger for the English VA domain which combines a neural architecture for robust TE recognition, with a parser-based TE normalizer. We find that transfer learning goes a long way even with as little as 25 in-domain sentences: DA-Time performs at the state of the art on the news domain, and substantially outperforms it on the VA domain.

2020

pdf
PATE: A Corpus of Temporal Expressions for the In-car Voice Assistant Domain
Alessandra Zarcone | Touhidul Alam | Zahra Kolagar
Proceedings of the Twelfth Language Resources and Evaluation Conference

The recognition and automatic annotation of temporal expressions (e.g. “Add an event for tomorrow evening at eight to my calendar”) is a key module for AI voice assistants, in order to allow them to interact with apps (for example, a calendar app). However, in the NLP literature, research on temporal expressions has focused mostly on data from the news, from the clinical domain, and from social media. The voice assistant domain is very different than the typical domains that have been the focus of work on temporal expression identification, thus requiring a dedicated data collection. We present a crowdsourcing method for eliciting natural-language commands containing temporal expressions for an AI voice assistant, by using pictures and scenario descriptions. We annotated the elicited commands (480) as well as the commands in the Snips dataset following the TimeML/TIMEX3 annotation guidelines, reaching a total of 1188 annotated commands. The commands can be later used to train the NLU components of an AI voice assistant.

2017

pdf bib
Inducing Script Structure from Crowdsourced Event Descriptions via Semi-Supervised Clustering
Lilian Wanzare | Alessandra Zarcone | Stefan Thater | Manfred Pinkal
Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics

We present a semi-supervised clustering approach to induce script structure from crowdsourced descriptions of event sequences by grouping event descriptions into paraphrase sets (representing event types) and inducing their temporal order. Our approach exploits semantic and positional similarity and allows for flexible event order, thus overcoming the rigidity of previous approaches. We incorporate crowdsourced alignments as prior knowledge and show that exploiting a small number of alignments results in a substantial improvement in cluster quality over state-of-the-art models and provides an appropriate basis for the induction of temporal order. We also show a coverage study to demonstrate the scalability of our approach.

2016

pdf
A Crowdsourced Database of Event Sequence Descriptions for the Acquisition of High-quality Script Knowledge
Lilian D. A. Wanzare | Alessandra Zarcone | Stefan Thater | Manfred Pinkal
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Scripts are standardized event sequences describing typical everyday activities, which play an important role in the computational modeling of cognitive abilities (in particular for natural language processing). We present a large-scale crowdsourced collection of explicit linguistic descriptions of script-specific event sequences (40 scenarios with 100 sequences each). The corpus is enriched with crowdsourced alignment annotation on a subset of the event descriptions, to be used in future work as seed data for automatic alignment of event descriptions (for example via clustering). The event descriptions to be aligned were chosen among those expected to have the strongest corrective effect on the clustering algorithm. The alignment annotation was evaluated against a gold standard of expert annotators. The resulting database of partially-aligned script-event descriptions provides a sound empirical basis for inducing high-quality script knowledge, as well as for any task involving alignment and paraphrase detection of events.

2013

pdf
Fitting, Not Clashing! A Distributional Semantic Model of Logical Metonymy
Alessandra Zarcone | Alessandro Lenci | Sebastian Padó | Jason Utt
Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Short Papers

pdf
The Curious Case of Metonymic Verbs: A Distributional Characterization
Jason Utt | Alessandro Lenci | Sebastian Padó | Alessandra Zarcone
Proceedings of the IWCS 2013 Workshop Towards a Formal Distributional Semantics

2012

pdf
Modeling covert event retrieval in logical metonymy: probabilistic and distributional accounts
Alessandra Zarcone | Jason Utt | Sebastian Padó
Proceedings of the 3rd Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2012)

pdf
Logical metonymies and qualia structures: an annotated database of logical metonymies for German
Alessandra Zarcone | Stefan Rued
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Logical metonymies like """"The author began the book"""" involve the interpretation of events that are not realized in the sentence (Covert events: -> """"writing the book""""). The Generative Lexicon (Pustejovsky 1995) provides a qualia-based account of covert event interpretation, claiming that the covert event is retrieved from the qualia structure of the object. Such a theory poses the question of to what extent covert events in logical metonymies can be accounted for by qualia structures. Building on previous work on English, we present a corpus study for German verbs (""""anfangen (mit)"""", """"aufhoeren (mit)"""", """"beenden"""", """"beginnen (mit)"""", """"geniessen"""", based on data obtained from the deWaC corpus. We built a corpus of logical metonymies, which were manually annotated and compared with the qualia structures of their objects, then we contrasted annotation results from two expert annotators for metonymies (""""The author began the book"""") and long forms (""""The author began reading the book"""") across verbs. Our annotation was evaluated on a sample of sentences annotated by a group of naive annotators on a crowdsourcing platform. The logical metonymy database (2661 metonymies and 1886 long forms) with two expert annotations is freely available for scientific research purposes.

2008

pdf
Computational Models for Event Type Classification in Context
Alessandra Zarcone | Alessandro Lenci
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Verb lexical semantic properties are only one of the factors that contribute to the determination of the event type expressed by a sentence, which is instead the result of a complex interplay between the verb meaning and its linguistic context. We report on two computational models for the automatic identification of event type in Italian. Both models use linguistically-motivated features extracted from Italian corpora. The main goal of our experiments is to evaluate the contribution of different types of linguistic indicators to identify the event type of a sentence, as well as to model various cases of context-driven event type shift. In the first model, event type identification has been modelled as a supervised classification task, performed with Maximum Entropy classifiers. In the second model, Self-Organizing Maps have been used to define and identify event types in an unsupervised way. The interaction of various contextual factors in determining the event type expressed by a sentence makes event type identification a highly challenging task. Computational models can help us to shed new light on the real structure of event type classes as well as to gain a better understanding of context-driven semantic shifts.