2022
pdf
abs
Sentiment Analysis of Homeric Text: The 1st Book of Iliad
John Pavlopoulos
|
Alexandros Xenos
|
Davide Picca
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Sentiment analysis studies are focused more on online customer reviews or social media, and less on literary studies. The problem is greater for ancient languages, where the linguistic expression of sentiments may diverge from modern linguistic forms. This work presents the outcome of a sentiment annotation task of the first Book of Iliad, an ancient Greek poem. The annotators were provided with verses translated into modern Greek and they annotated the perceived emotions and sentiments verse by verse. By estimating the fraction of annotators that found a verse as belonging to a specific sentiment class, we model the poem’s perceived sentiment as a multi-variate time series. By experimenting with a state of the art deep learning masked language model, pre-trained on modern Greek and fine-tuned to estimate the sentiment of our data, we registered a mean squared error of 0.063. This low error indicates that sentiment estimators built on our dataset can potentially be used as mechanical annotators, hence facilitating the distant reading of Homeric text. Our dataset is released for public use.
2020
pdf
abs
WeDH - a Friendly Tool for Building Literary Corpora Enriched with Encyclopedic Metadata
Mattia Egloff
|
Davide Picca
Proceedings of the Twelfth Language Resources and Evaluation Conference
In recent years the interest in the use of repositories of literary works has been successful. While many efforts related to Linked Open Data go in the right direction, the use of these repositories for the creation of text corpora enriched with metadata remains difficult and cumbersome. In fact, many of these repositories can be useful to the community not only for the automatic creation of textual corpora but also for retrieving crucial meta-information about texts. In particular, the use of metadata provides the reader with a wealth of information that is often not identifiable in the texts themselves. Our project aims to fill both the access to the textual resources available on the web and the possibility of combining these resources with sources of metadata that can enrich the texts with useful information lengthening the life and maintenance of the data itself. We introduce here a user-friendly web interface of the Digital Humanities toolkit named WeDH with which the user can leverage the encyclopedic knowledge provided by DBpedia, wikidata and VIAF in order to enrich the corpora with bibliographical and exegetical knowledge. WeDH is a collaborative project and we invite anyone who has ideas or suggestions regarding this procedure to reach out to us.
2009
pdf
Bridging Languages by SuperSense Entity Tagging
Davide Picca
|
Alfio Massimiliano Gliozzo
|
Simone Campora
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)
2008
pdf
abs
LMM: an OWL-DL MetaModel to Represent Heterogeneous Lexical Knowledge
Davide Picca
|
Alfio Massimiliano Gliozzo
|
Aldo Gangemi
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
In this paper we present a Linguistic Meta-Model (LMM) allowing a semiotic-cognitive representation of knowledge. LMM is freely available and integrates the schemata of linguistic knowledge resources, such as WordNet and FrameNet, as well as foundational ontologies, such as DOLCE and its extensions. In addition, LMM is able to deal with multilinguality and to represent individuals and facts in an open domain perspective.
pdf
abs
Supersense Tagger for Italian
Davide Picca
|
Alfio Massimiliano Gliozzo
|
Massimiliano Ciaramita
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
In this paper we present the procedure we followed to develop the Italian Super Sense Tagger. In particular, we adapted the English SuperSense Tagger to the Italian Language by exploiting a parallel sense labeled corpus for training. As for English, the Italian tagger uses a fixed set of 26 semantic labels, called supersenses, achieving a slightly lower accuracy due to the lower quality of the Italian training data. Both taggers accomplish the same task of identifying entities and concepts belonging to a common set of ontological types. This parallelism allows us to define effective methodologies for a broad range of cross-language knowledge acquisition tasks