2023
pdf
A uniform RDF-based Representation of the Interlinking of Wordnets and Sign Language Data
Thierry Declerck
|
Sam Bigeard
|
Dorians Callus
|
Benjamin Matthews
|
Sussi Olsen
|
Loran Ripard Xuereb
Proceedings of the 4th Conference on Language, Data and Knowledge
pdf
bib
abs
A Linked Data Approach for linking and aligning Sign Language and Spoken Language Data
Thierry Declerck
|
Sam Bigeard
|
Fahad Khan
|
Irene Murtagh
|
Sussi Olsen
|
Mike Rosner
|
Ineke Schuurman
|
Andon Tchechmedjiev
|
Andy Way
Proceedings of the Second International Workshop on Automatic Translation for Signed and Spoken Languages
We present work dealing with a Linked Open Data (LOD)-compliant representation of Sign Language (SL) data, with the goal of supporting the cross-lingual alignment of SL data and their linking to Spoken Language (SpL) data. The proposed representation is based on activities of groups of researchers in the field of SL who have investigated the use of Open Multilingual Wordnet (OMW) datasets for (manually) cross-linking SL data or for linking SL and SpL data. Another group of researchers is proposing an XML encoding of articulatory elements of SLs and (manually) linking those to an SpL lexical resource. We propose an RDF-based representation of those various data. This unified formal representation offers a semantic repository of information on SL and SpL data that could be accessed for supporting the creation of datasets for training or evaluating NLP applications dealing with SLs, thinking for example of Machine Translation (MT) between SLs and between SLs and SpLs.
pdf
abs
What shall we read : the article or the citations? - A case study on scientific language understanding
Aman Sinha
|
Sam Bigeard
|
Marianne Clausel
|
Mathieu Constant
Actes de CORIA-TALN 2023. Actes de l'atelier "Analyse et Recherche de Textes Scientifiques" (ARTS)@TALN 2023
The number of scientific articles is increasing tremendously across all domains to such an extent that it has become hard for researchers to remain up-to-date. Evidently, scientific language understanding systems and Information Extraction (IE) systems, with the advancement of Natural Language Processing (NLP) techniques, are benefiting the needs of users. Although the majority of the practices for building such systems are data-driven, advocating the idea of “The more, the better”. In this work, we revisit the paradigm - questioning what type of data : text (title, abstract) or citations, can have more impact on the performance of scientific language understanding systems.
2022
pdf
bib
abs
Introducing Sign Languages to a Multilingual Wordnet: Bootstrapping Corpora and Lexical Resources of Greek Sign Language and German Sign Language
Sam Bigeard
|
Marc Schulder
|
Maria Kopf
|
Thomas Hanke
|
Kyriaki Vasilaki
|
Anna Vacalopoulou
|
Theodore Goulas
|
Athanasia-Lida Dimou
|
Stavroula-Evita Fotinea
|
Eleni Efthimiou
Proceedings of the LREC2022 10th Workshop on the Representation and Processing of Sign Languages: Multilingual Sign Language Resources
Wordnets have been a popular lexical resource type for many years. Their sense-based representation of lexical items and numerous relation structures have been used for a variety of computational and linguistic applications. The inclusion of different wordnets into multilingual wordnet networks has further extended their use into the realm of cross-lingual research. Wordnets have been released for many spoken languages. Research has also been carried out into the creation of wordnets for several sign languages, but none have yet resulted in publicly available datasets. This article presents our own efforts towards an inclusion of sign languages in a multilingual wordnet, starting with Greek Sign Language (GSL) and German Sign Language (DGS). Based on differences in available language resources between GSL and DGS, we trial two workflows with different coverage priorities. We also explore how synergies between both workflows can be leveraged and how future work on additional sign languages could profit from building on existing sign language wordnet data. The results of our work are made publicly available.