Francesco Mambrini

2020

pdf bib abs
Representing Etymology in the LiLa Knowledge Base of Linguistic Resources for Latin
Francesco Mambrini | Marco Passarotti
Proceedings of the 2020 Globalex Workshop on Linked Lexicography

In this paper we describe the process of inclusion of etymological information in a knowledge base of interoperable Latin linguistic resources developed in the context of the LiLa: Linking Latin project. Interoperability is obtained by applying the Linked Open Data principles. Particularly, an extensive collection of Latin lemmas is used to link the (distributed) resources. For the etymology, we rely on the Ontolex-lemon ontology and the lemonEty extension to model the information, while the source data are taken from a recent etymological dictionary of Latin. As a result, the collection of lemmas LiLa is built around now includes 1,465 Proto-Italic and 1,393 Proto-Indo-European reconstructed forms that are used to explain the history of 1,400 Latin words. We discuss the motivation, methodology and modeling strategies of the work, as well as its possible applications and potential future developments.

2019

pdf bib abs
Harmonizing Different Lemmatization Strategies for Building a Knowledge Base of Linguistic Resources for Latin
Francesco Mambrini | Marco Passarotti
Proceedings of the 13th Linguistic Annotation Workshop

The interoperability between lemmatized corpora of Latin and other resources that use the lemma as indexing key is hampered by the multiple lemmatization strategies that different projects adopt. In this paper we discuss how we tackle the challenges raised by harmonizing different lemmatization criteria in the context of a project that aims to connect linguistic resources for Latin using the Linked Data paradigm. The paper introduces the architecture supporting an open-ended, lemma-based Knowledge Base, built to make textual and lexical resources for Latin interoperable. Particularly, the paper describes the inclusion into the Knowledge Base of its lexical basis, of a word formation lexicon and of a lemmatized and syntactically annotated corpus.

pdf bib
Linked Open Treebanks. Interlinking Syntactically Annotated Corpora in the LiLa Knowledge Base of Linguistic Resources for Latin
Francesco Mambrini | Marco Passarotti
Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019)

pdf bib
The Treatment of Word Formation in the LiLa Knowledge Base of Linguistic Resources for Latin
Eleonora Litta | Marco Passarotti | Francesco Mambrini
Proceedings of the Second International Workshop on Resources and Tools for Derivational Morphology

2013

pdf bib
Non-Projectivity in the Ancient Greek Dependency Treebank
Francesco Mambrini | Marco Passarotti
Proceedings of the Second International Conference on Dependency Linguistics (DepLing 2013)

2012

pdf bib abs
First Steps towards the Semi-automatic Development of a Wordformation-based Lexicon of Latin
Marco Passarotti | Francesco Mambrini
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Although lexicography of Latin has a long tradition dating back to ancient grammarians, and almost all Latin grammars devote to wordformation at least one part of the section(s) concerning morphology, none of the today available lexical resources and NLP tools of Latin feature a wordformation-based organization of the Latin lexicon. In this paper, we describe the first steps towards the semi-automatic development of a wordformation-based lexicon of Latin, by detailing several problems occurring while building the lexicon and presenting our solutions. Developing a wordformation-based lexicon of Latin is nowadays of outmost importance, as the last years have seen a large growth of annotated corpora of Latin texts of different eras. While these corpora include lemmatization, morphological tagging and syntactic analysis, none of them features segmentation of the word forms and wordformation relations between the lexemes. This restricts the browsing and the exploitation of the annotated data for linguistic research and NLP tasks, such as information retrieval and heuristics in PoS tagging of unknown words.