Carole Tiberius


2024

pdf
Multiword Expressions between the Corpus and the Lexicon: Universality, Idiosyncrasy, and the Lexicon-Corpus Interface
Verginica Barbu Mititelu | Voula Giouli | Kilian Evang | Daniel Zeman | Petya Osenova | Carole Tiberius | Simon Krek | Stella Markantonatou | Ivelina Stoyanova | Ranka Stanković | Christian Chiarcos
Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024

We present ongoing work towards defining a lexicon-corpus interface to serve as a benchmark in the representation of multiword expressions (of various parts of speech) in dedicated lexica and the linking of these entries to their corpus occurrences. The final aim is the harnessing of such resources for the automatic identification of multiword expressions in a text. The involvement of several natural languages aims at the universality of a solution not centered on a particular language, and also accommodating idiosyncrasies. Challenges in the lexicographic description of multiword expressions are discussed, the current status of lexica dedicated to this linguistic phenomenon is outlined, as well as the solution we envisage for creating an ecosystem of interlinked lexica and corpora containing and, respectively, annotated with multiword expressions.

2020

pdf
A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment
Sina Ahmadi | John Philip McCrae | Sanni Nimb | Fahad Khan | Monica Monachini | Bolette Pedersen | Thierry Declerck | Tanja Wissik | Andrea Bellandi | Irene Pisani | Thomas Troelsgård | Sussi Olsen | Simon Krek | Veronika Lipp | Tamás Váradi | László Simon | András Gyorffy | Carole Tiberius | Tanneke Schoonheim | Yifat Ben Moshe | Maya Rudich | Raya Abu Ahmad | Dorielle Lonke | Kira Kovalenko | Margit Langemets | Jelena Kallas | Oksana Dereza | Theodorus Fransen | David Cillessen | David Lindemann | Mikel Alonso | Ana Salgado | José Luis Sancho | Rafael-J. Ureña-Ruiz | Jordi Porta Zamorano | Kiril Simov | Petya Osenova | Zara Kancheva | Ivaylo Radev | Ranka Stanković | Andrej Perdih | Dejan Gabrovsek
Proceedings of the Twelfth Language Resources and Evaluation Conference

Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages and resources and focuses on the more challenging task of linking general-purpose language. We believe that our data will pave the way for further advances in alignment and evaluation of word senses by creating new solutions, particularly those notoriously requiring data such as neural networks. Our resources are publicly available at https://github.com/elexis-eu/MWSA.

pdf bib
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons
Stella Markantonatou | John McCrae | Jelena Mitrović | Carole Tiberius | Carlos Ramisch | Ashwini Vaidya | Petya Osenova | Agata Savary
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons

2018

pdf
ELEXIS - a European infrastructure fostering cooperation and information exchange among lexicographical research communities
Bolette Pedersen | John McCrae | Carole Tiberius | Simon Krek
Proceedings of the 9th Global Wordnet Conference

The paper describes objectives, concept and methodology for ELEXIS, a European infrastructure fostering cooperation and information exchange among lexicographical research communities. The infrastructure is a newly granted project under the Horizon 2020 INFRAIA call, with the topic Integrating Activities for Starting Communities. The project is planned to start in January 2018.

2014

pdf
Taalportaal: an online grammar of Dutch and Frisian
Frank Landsbergen | Carole Tiberius | Roderik Dernison
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper, we present the Taalportaal project. Taalportaal will create an online portal containing an exhaustive and fully searchable electronic reference of Dutch and Frisian phonology, morphology and syntax. Its content will be in English. The main aim of the project is to serve the scientific community by organizing, integrating and completing the grammatical knowledge of both languages, and to make this data accessible in an innovative way. The project is carried out by a consortium of four universities and research institutions. Content is generated in two ways: (1) by a group of authors who, starting from existing grammatical resources, write text directly in XML, and (2) by integrating the full Syntax of Dutch into the portal, after an automatic conversion from Word to XML. We discuss the project’s workflow, content creation and management, the actual web application, and the way in which we plan to enrich the portal’s content, such as by crosslinking between topics and linking to external resources.

2008

pdf
Accessing the ANW Dictionary
Fons Moerdijk | Carole Tiberius | Jan Niestadt
Coling 2008: Proceedings of the Workshop on Cognitive Aspects of the Lexicon (COGALEX 2008)

pdf
Standardising Bilingual Lexical Resources According to the Lexicon Markup Framework
Isa Maks | Carole Tiberius | Remco van Veenendaal
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The Dutch HLT agency for language and speech technology (known as TST-centrale) at the Institute for Dutch Lexicology is responsible for the maintenance, distribution and accessibility of (Dutch) digital language resources. In this paper we present a project which aims to standardise the format of a set of bilingual lexicons in order to make them available to potential users, to facilitate the exchange of data (among the resources and with other (monolingual) resources) and to enable reuse of these lexicons for NLP applications like machine translation and multilingual information retrieval. We pay special attention to the methods and tools we used and to some of the problematic issues we encountered during the conversion process. As these problems are mainly caused by the fact that the standard LMF model fails in representing the detailed semantic and pragmatic distinctions made in our bilingual data, we propose some modifications to the standard. In general, we think that a standard for lexicons should provide a model for bilingual lexicons that is able to represent all detailed and fine-grained translation information which is generally found in these types of lexicons.

2004

pdf bib
Inflectional Syncretism and Corpora
Dunstan Brown | Carole Tiberius | Greville G. Corbett
Proceedings of the 5th International Workshop on Linguistically Interpreted Corpora

2003

pdf bib
A Large-scale Inheritance-based Morphological Lexicon for Russian
Roger Evans | Carole Tiberius | Dunstan Brown | Greville C. Corbett
Proceedings of the 2003 EACL Workshop on Morphological Processing of Slavic Languages

2002

pdf
Cross Linguistic Phoneme Correspondences
Lynne Cahill | Carole Tiberius
COLING 2002: The 17th International Conference on Computational Linguistics: Project Notes

pdf
How to build a multilingual inheritance-based lexicon
Carole Tiberius
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf
A typological database of agreement
Carole Tiberius | Dunstan Brown | Greville Corbett
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2000

pdf
Incorporating Metaphonemes in a Multilingual Lexicon
Carole Tiberius | Lynne Cahill
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics