2023
pdf
abs
Enriching Multiword Terms in Wiktionary with Pronunciation Information
Lenka Bajcetic
|
Thierry Declerck
|
Gilles Sérasset
Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)
We report on work in progress dealing with the automated generation of pronunciation information for English multiword terms (MWTs) in Wiktionary, combining information available for their single components. We describe the issues we were encountering, the building of an evaluation dataset, and our teaming with the DBnary resource maintainer. Our approach shows potential for automatically adding morphosyntactic and semantic information to the components of such MWTs.
pdf
Leveraging DBnary Data to Enrich Information of Multiword Terms in Wiktionary
Gilles Sérasset
|
Thierry Declerck
|
Lenka Bajčetić
Proceedings of the 4th Conference on Language, Data and Knowledge
2022
pdf
bib
abs
Towards the Profiling of Linked Lexicographic Resources
Lenka Bajcetic
|
Seung-bin Yim
|
Thierry Declerck
Proceedings of Globalex Workshop on Linked Lexicography within the 13th Language Resources and Evaluation Conference
This paper presents Edie: ELEXIS DIctionary Evaluator. Edie is designed to create profiles for lexicographic resources accessible through the ELEXIS platform. These profiles can be used to evaluate and compare lexicographic resources, and in particular they can be used to identify potential data that could be linked.
pdf
abs
Using Wiktionary to Create Specialized Lexical Resources and Datasets
Lenka Bajčetić
|
Thierry Declerck
Proceedings of the Thirteenth Language Resources and Evaluation Conference
This paper describes an approach aiming at utilizing Wiktionary data for creating specialized lexical datasets which can be used for enriching other lexical (semantic) resources or for generating datasets that can be used for evaluating or improving NLP tasks, like Word Sense Disambiguation, Word-in-Context challenges, or Sense Linking across lexicons and dictionaries. We have focused on Wiktionary data about pronunciation information in English, and grammatical number and grammatical gender in German.
2021
pdf
abs
Towards the Addition of Pronunciation Information to Lexical Semantic Resources
Thierry Declerck
|
Lenka Bajčetić
Proceedings of the 11th Global Wordnet Conference
This paper describes ongoing work aiming at adding pronunciation information to lexical semantic resources, with a focus on open wordnets. Our goal is not only to add a new modality to those semantic networks, but also to mark heteronyms listed in them with the pronunciation information associated with their different meanings. This work could contribute in the longer term to the disambiguation of multi-modal resources, which are combining text and speech.
2020
pdf
abs
Implementation of Supervised Training Approaches for Monolingual Word Sense Alignment: ACDH-CH System Description for the MWSA Shared Task at GlobaLex 2020
Lenka Bajcetic
|
Seung-bin Yim
Proceedings of the 2020 Globalex Workshop on Linked Lexicography
This paper describes our system for monolingual sense alignment across dictionaries. The task of monolingual word sense alignment is presented as a task of predicting the relationship between two senses. We will present two solutions, one based on supervised machine learning, and the other based on pre-trained neural network language model, specifically BERT. Our models perform competitively for binary classification, reporting high scores for almost all languages. This paper presents our submission for the shared task on monolingual word sense alignment across dictionaries as part of the GLOBALEX 2020 – Linked Lexicography workshop at the 12th Language Resources and Evaluation Conference (LREC). Monolingual word sense alignment (MWSA) is the task of aligning word senses across re- sources in the same language. Lexical-semantic resources (LSR) such as dictionaries form valuable foundation of numerous natural language process- ing (NLP) tasks. Since they are created manually by ex- perts, dictionaries can be considered among the resources of highest quality and importance. However, the existing LSRs in machine readable form are small in scope or miss- ing altogether. Thus, it would be extremely beneficial if the existing lexical resources could be connected and ex- panded. Lexical resources display considerable variation in the number of word senses that lexicographers assign to a given entry in a dictionary. This is because the identification and differentiation of word senses is one of the harder tasks that lexicographers face. Hence, the task of combining dictio- naries from different sources is difficult, especially for the case of mapping the senses of entries, which often differ significantly in granularity and coverage. (Ahmadi et al., 2020) There are three different angles from which the problem of word sense alignment can be addressed: approaches based on the similarity of textual descriptions of word senses, ap- proaches based on structural properties of lexical-semantic resources, and a combination of both. (Matuschek, 2014) In this paper we focus on the similarity of textual de- scriptions. This is a common approach as the majority of previous work used some notion of similarity between senses, mostly gloss overlap or semantic relatedness based on glosses. This makes sense, as glosses are a prerequisite for humans to recognize the meaning of an encoded sense, and thus also an intuitive way of judging the similarity of senses. (Matuschek, 2014) The paper is structured as follows: we provide a brief overview of related work in Section 2, and a description of the corpus in Section 3. In Section 4 we explain all impor- tant aspects of our model implementation, while the results are presented in Section 5. Finally, we end the paper with the discussion in Section 6 and conclusion in Section 7.
pdf
abs
Adding Pronunciation Information to Wordnets
Thierry Declerck
|
Lenka Bajcetic
|
Melanie Siegel
Proceedings of the LREC 2020 Workshop on Multimodal Wordnets (MMW2020)
We describe on-going work consisting in adding pronunciation information to wordnets, as such information can indicate specific senses of a word. Many wordnets associate with their senses only a lemma form and a part-of-speech tag. At the same time, we are aware that additional linguistic information can be useful for identifying a specific sense of a wordnet lemma when encountered in a corpus. While work already deals with the addition of grammatical number or grammatical gender information to wordnet lemmas,we are investigating the linking of wordnet lemmas to pronunciation information, adding thus a speech-related modality to wordnets