Alexandre Rademaker

2024

pdf abs
Deductive Verification of LLM Generated SPARQL Queries
Alexandre Rademaker | Guilherme Lima | Sandro Rama Fiorini | Viviane Torres da Silva
Proceedings of the Workshop on Deep Learning and Linked Data (DLnLD) @ LREC-COLING 2024

Considering the increasing applications of Large Language Models (LLMs) to many natural language tasks, this paper presents preliminary findings on developing a verification component for detecting hallucinations of an LLM that produces SPARQL queries from natural language questions. We suggest a logic-based deductive verification of the generated SPARQL query by checking if the original NL question’s deep semantic representation entails the SPARQL’s semantic representation.

2023

Semantic role labeling (SRL) identifies the predicate-argument structure in a sentence. This task is usually accomplished in four steps: predicate identification, predicate sense disambiguation, argument identification, and argument classification. Errors introduced at one step propagate to later steps. Unfortunately, the existing SRL evaluation scripts do not consider the full effect of this error propagation aspect. They either evaluate arguments independent of predicate sense (CoNLL09) or do not evaluate predicate sense at all (CoNLL05), yielding an inaccurate SRL model performance on the argument classification task. In this paper, we address key practical issues with existing evaluation scripts and propose a more strict SRL evaluation metric PriMeSRL. We observe that by employing PriMeSRL, the quality evaluation of all SoTA SRL models drops significantly, and their relative rankings also change. We also show that PriMeSRLsuccessfully penalizes actual failures in SoTA SRL models.

pdf
Extracting higher-order logic formulas from English sentences
Alexandre Rademaker | Guilherme Lima | Renato Cerqueira
Proceedings of the 6th International Conference on Natural Language and Speech Processing (ICNLSP 2023)

pdf bib
Proceedings of the 12th Global Wordnet Conference
German Rigau | Francis Bond | Alexandre Rademaker
Proceedings of the 12th Global Wordnet Conference

In this project note we describe our work to make better documentation for the Open Multilingual Wordnet (OMW), a platform integrating many open wordnets. This includes the documentation of the OMW website itself as well as of semantic relations used by the component wordnets. Some of this documentation work was done with the support of the Google Season of Docs. The OMW project page, which links both to the actual OMW server and the documentation has been moved to a new location: https://omwn.org.

pdf abs
Semantic Parsing and Sense Tagging the Princeton WordNet Gloss Corpus
Alexandre Rademaker | Abhishek Basu | Rajkiran Veluri
Proceedings of the 12th Global Wordnet Conference

In 2008, the Princeton team released the last version of the “Princeton Annotated Gloss Corpus”. In this corpus. The word forms from the definitions and examples (glosses) of Princeton WordNet are manually linked to the context-appropriate sense in WordNet. However, the annotation was not complete, and the dataset was never officially released as part of WordNet 3.0, remaining as one of the standoff files available for download. Eleven years later, in 2019, one of the authors of this paper restarted the project aiming to complete the sense annotation of the approximately 200 thousand word forms not yet annotated. Here, we provide additional motivations to complete this dataset and report the progress in the work and evaluations. Intending to provide an extra level of consistency in the sense annotation and a deep semantic representation of the definitions and examples promoting WordNet from a lexical resource to a lightweight ontology, we now employ the English Resource Grammar (ERG), a broad-coverage HPSG grammar of English to parse the sentences and project the sense annotations from the surface words to the ERG predicates. We also report some initial steps on upgrading the corpus to WordNet 3.1 to facilitate mapping the data to other lexical resources.

2022

Semantic role labeling (SRL) represents the meaning of a sentence in the form of predicate-argument structures. Such shallow semantic analysis is helpful in a wide range of downstream NLP tasks and real-world applications. As treebanks enabled the development of powerful syntactic parsers, the accurate predicate-argument analysis demands training data in the form of propbanks. Unfortunately, most languages simply do not have corresponding propbanks due to the high cost required to construct such resources. To overcome such challenges, Universal Proposition Bank 1.0 (UP1.0) was released in 2017, with high-quality propbank data generated via a two-stage method exploiting monolingual SRL and multilingual parallel data. In this paper, we introduce Universal Proposition Bank 2.0 (UP2.0), with significant enhancements over UP1.0: (1) propbanks with higher quality by using a state-of-the-art monolingual SRL and improved auto-generation of annotations; (2) expanded language coverage (from 7 to 9 languages); (3) span annotation for the decoupling of syntactic analysis; and (4) Gold data for a subset of the languages. We also share our experimental results that confirm the significant quality improvements of the generated propbanks. In addition, we present a comprehensive experimental evaluation on how different implementation choices impact the quality of the resulting data. We release these resources to the research community and hope to encourage more research on cross-lingual SRL.

2021

The Global Wordnet Formats have been introduced to enable wordnets to have a common representation that can be integrated through the Global WordNet Grid. As a result of their adoption, a number of shortcomings of the format were identified, and in this paper we describe the extensions to the formats that address these issues. These include: ordering of senses, dependencies between wordnets, pronunciation, syntactic modelling, relations, sense keys, metadata and RDF support. Furthermore, we provide some perspectives on how these changes help in the integration of wordnets.

pdf abs
A Universal Dependencies Corpora Maintenance Methodology Using Downstream Application
Ran Iwamoto | Hiroshi Kanayama | Alexandre Rademaker | Takuya Ohko
Proceedings of the Third Workshop on Computational Typology and Multilingual NLP

This paper investigates updates of Universal Dependencies (UD) treebanks in 23 languages and their impact on a downstream application. Numerous people are involved in updating UD’s annotation guidelines and treebanks in various languages. However, it is not easy to verify whether the updated resources maintain universality with other language resources. Thus, validity and consistency of multilingual corpora should be tested through application tasks involving syntactic structures with PoS tags, dependency labels, and universal features. We apply the syntactic parsers trained on UD treebanks from multiple versions (2.0 to 2.7) to a clause-level sentiment extractor. We then analyze the relationships between attachment scores of dependency parsers and performance in application tasks. For future UD developments, we show examples of outputs that differ depending on version.

2020

pdf abs
English WordNet 2020: Improving and Extending a WordNet for English using an Open-Source Methodology
John Philip McCrae | Alexandre Rademaker | Ewa Rudnicka | Francis Bond
Proceedings of the LREC 2020 Workshop on Multimodal Wordnets (MMW2020)

WordNet, while one of the most widely used resources for NLP, has not been updated for a long time, and as such a new project English WordNet has arisen to continue the development of the model under an open-source paradigm. In this paper, we detail the second release of this resource entitled “English WordNet 2020”. The work has focused firstly, on the introduction of new synsets and senses and developing guidelines for this and secondly, on the integration of contributions from other projects. We present the changes in this edition, which total over 15,000 changes over the previous release.

pdf abs
Inclusion of Lithological terms (rocks and minerals) in The Open Wordnet for English
Alexandre Tessarollo | Alexandre Rademaker
Proceedings of the LREC 2020 Workshop on Multimodal Wordnets (MMW2020)

We extend the Open WordNet for English (OWN-EN) with rock-related and other lithological terms using the authoritative source of GBA’s Thesaurus. Our aim is to improve WordNet to better function within Oil & Gas domain, particularly geoscience texts. We use a three step approach: a proof of concept-level extension of WordNet, a major extension on which we evaluate the impact with positive results and a full extension encompassing all GBA’s lithological terms. We also build a mapping to GBA which also links to several other resources: WikiData, British Geological Survey, Inspire, GeoSciML and DBpedia.

2019

pdf abs
Fast developing of a Natural Language Interface for a Portuguese WordNet: Leveraging on Sentence Embeddings
Hugo Gonçalo Oliveira | Alexandre Rademaker
Proceedings of the 10th Global Wordnet Conference

We describe how a natural language interface can be developed for a wordnet with a small set of handcrafted templates, leveraging on sentence embeddings. The proposed approach does not use rules for parsing natural language queries but experiments showed that the embeddings model is tolerant enough for correctly predicting relation types that do not match known patterns exactly. It was tested with OpenWordNet-PT, for which this method may provide an alternative interface, with benefits also on the curation process.

pdf abs
English WordNet 2019 – An Open-Source WordNet for English
John P. McCrae | Alexandre Rademaker | Francis Bond | Ewa Rudnicka | Christiane Fellbaum
Proceedings of the 10th Global Wordnet Conference

We describe the release of a new wordnet for English based on the Princeton WordNet, but now developed under an open-source model. In particular, this version of WordNet, which we call English WordNet 2019, which has been developed by multiple people around the world through GitHub, fixes many errors in previous wordnets for English. We give some details of the changes that have been made in this version and give some perspectives about likely future changes that will be made as this project continues to evolve.

pdf abs
Portuguese Manners of Speaking
Valeria de Paiva | Alexandre Rademaker
Proceedings of the 10th Global Wordnet Conference

Lexical resources need to be as complete as possible. Very little work seems to have been done on adverbs, the smallest part of speech class in Princeton WordNet counting the number of synsets. Amongst adverbs, manner adverbs ending in ‘-ly’ seem the easiest to work with, as their meaning is almost the same as the one of the associated adjective. This phenomenon seems to be parallel in English and Portuguese, where these manner adverbs finish in the suffix ‘-mente’. We use this correspondence to improve the coverage of adverbs in the lexical resource OpenWordNet-PT, a wordnet for Portuguese.

pdf abs
Completing the Princeton Annotated Gloss Corpus Project
Alexandre Rademaker | Bruno Cuconato | Alessandra Cid | Alexandre Tessarollo | Henrique Andrade
Proceedings of the 10th Global Wordnet Conference

In the Princeton WordNet Gloss Corpus, the word forms from the definitions (“glosses”) in WordNet’s synsets are manually linked to the context-appropriate sense in the WordNet. The glosses then become a sense-disambiguated corpus annotated against WordNet version 3.0. The result is also called a semantic concordance, which can be seen as both a lexicon (WordNet extension) and an annotated corpus. In this work we motivate and present the initial steps to complete the annotation of all open-class words in this corpus. Finally, we introduce a freely-available annotation interface built as an Emacs extension, and evaluate a preliminary annotation effort.

pdf bib
Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019)
Alexandre Rademaker | Francis Tyers
Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019)

2018

pdf abs
Using OpenWordnet-PT for Question Answering on Legal Domain
Pedro Delfino | Bruno Cuconato | Guilherme Paulino-Passos | Gerson Zaverucha | Alexandre Rademaker
Proceedings of the 9th Global Wordnet Conference

In order to practice a legal profession in Brazil, law graduates must be approved in the OAB national unified bar exam. For their topic coverage and national reach, the OAB exams provide an excellent benchmark for the performance of legal information systems, as it provides objective metrics and are challenging even for humans, as only 20% of its candidates are approved. After constructing a new data set on the exams and doing shallow experiments on it, we now employ the OpenWordnet-PT to verify whether using word senses and relations we can improve previous results. We discuss the results, possible future ideas and the additions to the OpenWordnet-PT that we made.

pdf abs
Extending Wordnet to Geological Times
Henrique Muniz | Fabricio Chalub | Alexandre Rademaker | Valeria De Paiva
Proceedings of the 9th Global Wordnet Conference

This paper describes work extending Princeton WordNet to the domain of geological texts, associated with the time periods of the geological eras of the Earth History. We intend this extension to be considered as an example for any other domain extension that we might want to pursue. To provide this extension, we first produce a textual version of Princeton WordNet. Then we map a fragment of the International Commission on Stratigraphy (ICS) ontologies to WordNet and create the appropriate new synsets. We check the extended ontology on a small corpus of sentences from Gas and Oil technical reports and realize that more work needs to be done, as we need new words, new senses and new compounds in our extended WordNet.

pdf
Text Mining for History: first steps on building a large dataset
Suemi Higuchi | Cláudia Freitas | Bruno Cuconato | Alexandre Rademaker
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

2016

pdf abs
Semantic Links for Portuguese
Fabricio Chalub | Livy Real | Alexandre Rademaker | Valeria de Paiva
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper describes work on incorporating Princenton’s WordNet morphosemantics links to the fabric of the Portuguese OpenWordNet-PT. Morphosemantic links are relations between verbs and derivationally related nouns that are semantically typed (such as for tune-tuner ― in Portuguese “afinar-afinador” – linked through an “agent” link). Morphosemantic links have been discussed for Princeton’s WordNet for a while, but have not been added to the official database. These links are very useful, they help us to improve our Portuguese WordNet. Thus we discuss the integration of these links in our base and the issues we encountered with the integration.

Semantic relations between words are key to building systems that aim to understand and manipulate language. For English, the “de facto” standard for representing this kind of knowledge is Princeton’s WordNet. Here, we describe the wordnet-like resources currently available for Portuguese: their origins, methods of creation, sizes, and usage restrictions. We start tackling the problem of comparing them, but only in quantitative terms. Finally, we sketch ideas for potential collaboration between some of the projects that produce Portuguese wordnets.

pdf abs
Verifying Integrity Constraints of a RDF-based WordNet
Alexandre Rademaker | Fabricio Chalub
Proceedings of the 8th Global WordNet Conference (GWC)

This paper presents our first attempt at verifying integrity constraints of our openWordnet-PT against the ontology for Wordnets encoding. Our wordnet is distributed in Resource Description Format (RDF) and we want to guarantee not only the syntax correctness but also its semantics soundness.

2015

pdf
HAREM and Klue: how to put two tagsets for named entities annotation together
Livy Real | Alexandre Rademaker
Proceedings of the Fifth Named Entity Workshop

pdf
Seeing is Correcting: curating lexical resources using social interfaces
Livy Real | Fabricio Chalub | Valeria de Paiva | Claudia Freitas | Alexandre Rademaker
Proceedings of the 4th Workshop on Linked Data in Linguistics: Resources and Applications

pdf bib
Proceedings of the 10th Brazilian Symposium in Information and Human Language Technology
Claudia Freitas | Alexandre Rademaker
Proceedings of the 10th Brazilian Symposium in Information and Human Language Technology

pdf
Anotação de corpus com a OpenWordNet-PT: um exercício de desambiguação (Sense annotation with OpenWordNet-PT: an exercise of word sense disambiguation)
Cláudia Freitas | Livy Real | Alexandre Rademaker
Proceedings of the 10th Brazilian Symposium in Information and Human Language Technology

2014

pdf abs
NomLex-PT: A Lexicon of Portuguese Nominalizations
Valeria de Paiva | Livy Real | Alexandre Rademaker | Gerard de Melo
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents NomLex-PT, a lexical resource describing Portuguese nominalizations. NomLex-PT connects verbs to their nominalizations, thereby enabling NLP systems to observe the potential semantic relationships between the two words when analysing a text. NomLex-PT is freely available and encoded in RDF for easy integration with other resources. Most notably, we have integrated NomLex-PT with OpenWordNet-PT, an open Portuguese Wordnet.

pdf
Embedding NomLex-BR nominalizations into OpenWordnet-PT
Alexandre Rademaker | Valeria de Paiva | Gerard de Melo | Livy Maria Real Coelho
Proceedings of the Seventh Global Wordnet Conference