Teresa Paccosi


2023

pdf
Scent and Sensibility: Perception Shifts in the Olfactory Domain
Teresa Paccosi | Stefano Menini | Elisa Leonardelli | Ilaria Barzon | Sara Tonelli
Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change

In this work, we investigate olfactory perception shifts, analysing how the description of the smells emitted by specific sources has changed over time. We first create a benchmark of selected smell sources, relying upon existing historical studies related to olfaction. We also collect an English text corpus by retrieving large collections of documents from freely available resources, spanning from 1500 to 2000 and covering different domains. We label such corpus using a system for olfactory information extraction inspired by frame semantics, where the semantic roles around the smell sources in the benchmark are marked. We then analyse how the roles describing Qualities of smell sources change over time and how they can contribute to characterise perception shifts, also in comparison with more standard statistical approaches.

pdf
Scent Mining: Extracting Olfactory Events, Smell Sources and Qualities
Stefano Menini | Teresa Paccosi | Serra Sinem Tekiroğlu | Sara Tonelli
Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

Olfaction is a rather understudied sense compared to the other senses. In NLP, however, there have been recent attempts to develop taxonomies and benchmarks specifically designed to capture smell-related information. In this work, we further extend this research line by presenting a supervised system for olfactory information extraction in English. We cast this problem as a token classification task and build a system that identifies smell words, smell sources and qualities. The classifier is then applied to a set of English historical corpora, covering different domains and written in a time period between the 15th and the 20th Century. A qualitative analysis of the extracted data shows that they can be used to infer interesting information about smelly items such as tea and tobacco from a diachronical perspective, supporting historical investigation with corpus-based evidence.

2022

pdf
KIND: an Italian Multi-Domain Dataset for Named Entity Recognition
Teresa Paccosi | Alessio Palmero Aprosio
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper we present KIND, an Italian dataset for Named-entity recognition. It contains more than one million tokens with annotation covering three classes: person, location, and organization. The dataset (around 600K tokens) mostly contains manual gold annotations in three different domains (news, literature, and political discourses) and a semi-automatically annotated part. The multi-domain feature is the main strength of the present work, offering a resource which covers different styles and language uses, as well as the largest Italian NER dataset with manual gold annotations. It represents an important resource for the training of NER systems in Italian. Texts and annotations are freely downloadable from the Github repository.

pdf
Building a Multilingual Taxonomy of Olfactory Terms with Timestamps
Stefano Menini | Teresa Paccosi | Serra Sinem Tekiroğlu | Sara Tonelli
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Olfactory references play a crucial role in our memory and, more generally, in our experiences, since researchers have shown that smell is the sense that is most directly connected with emotions. Nevertheless, only few works in NLP have tried to capture this sensory dimension from a computational perspective. One of the main challenges is the lack of a systematic and consistent taxonomy of olfactory information, where concepts are organised also in a multi-lingual perspective. WordNet represents a valuable starting point in this direction, which can be semi-automatically extended taking advantage of Google n-grams and of existing language models. In this work we describe the process that has led to the semi-automatic development of a taxonomy for olfactory information in four languages (English, French, German and Italian), detailing the different steps and the intermediate evaluations. Along with being multi-lingual, the taxonomy also encloses temporal marks for olfactory terms thus making it a valuable resource for historical content analysis. The resource has been released and is freely available.

pdf bib
A Multilingual Benchmark to Capture Olfactory Situations over Time
Stefano Menini | Teresa Paccosi | Sara Tonelli | Marieke Van Erp | Inger Leemans | Pasquale Lisena | Raphael Troncy | William Tullett | Ali Hürriyetoğlu | Ger Dijkstra | Femke Gordijn | Elias Jürgens | Josephine Koopman | Aron Ouwerkerk | Sanne Steen | Inna Novalija | Janez Brank | Dunja Mladenic | Anja Zidar
Proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change

We present a benchmark in six European languages containing manually annotated information about olfactory situations and events following a FrameNet-like approach. The documents selection covers ten domains of interest to cultural historians in the olfactory domain and includes texts published between 1620 to 1920, allowing a diachronic analysis of smell descriptions. With this work, we aim to foster the development of olfactory information extraction approaches as well as the analysis of changes in smell descriptions over time.