Hugo Gonçalo Oliveira

Also published as: Hugo Goncalo Oliveira, Hugo Gonçalo Oliveira


2024

pdf bib
Proceedings of the 16th International Conference on Computational Processing of Portuguese
Pablo Gamallo | Daniela Claro | António Teixeira | Livy Real | Marcos Garcia | Hugo Gonçalo Oliveira | Raquel Amaro
Proceedings of the 16th International Conference on Computational Processing of Portuguese

pdf
BATS-PT: Assessing Portuguese Masked Language Models in Lexico-Semantic Analogy Solving and Relation Completion
Hugo Gonçalo Oliveira | Ricardo Rodrigues | Bruno Ferreira | Purificação Silvano | Sara Carvalho
Proceedings of the 16th International Conference on Computational Processing of Portuguese

pdf
Question Answering for Dialogue State Tracking in Portuguese
Francisco Pais | Patricia Ferreira | Catarina Silva | Ana Alves | Hugo Gonçalo Oliveira
Proceedings of the 16th International Conference on Computational Processing of Portuguese

pdf
Exploring Multimodal Models for Humor Recognition in Portuguese
Marcio Inácio | Hugo Gonçalo Oliveira
Proceedings of the 16th International Conference on Computational Processing of Portuguese

2023

pdf
On the Acquisition of WordNet Relations in Portuguese from Pretrained Masked Language Models
Hugo Gonçalo Oliveira
Proceedings of the 12th Global Wordnet Conference

This paper studies the application of pretrained BERT in the acquisition of synonyms, antonyms, hypernyms and hyponyms in Portuguese. Masked patterns indicating those relations were compiled with the help of a service for validating semantic relations, and then used for prompting three pretrained BERT models, one multilingual and two for Portuguese (base and large). Predictions for the masks were evaluated in two different test sets. Results achieved by the monolingual models are interesting enough for considering these models as a source for enriching wordnets, especially when predicting hypernyms of nouns. Previously reported performances on prediction were improved with new patterns and with the large model. When it comes to selecting the related word from a set of four options, performance is even better, but not enough for outperforming the selection of the most similar word, as computed with static word embeddings.

pdf
Adopting Linguistic Linked Data Principles: Insights on Users’ Experience
Verginica Mititelu | Maria Pia Di Buono | Hugo Gonçalo Oliveira | Blerina Spahiu | Giedrė Valūnaitė-Oleškevičienė
Proceedings of the 4th Conference on Language, Data and Knowledge

pdf
GPT3 as a Portuguese Lexical Knowledge Base?
Hugo Gonçalo Oliveira | Ricardo Rodrigues
Proceedings of the 4th Conference on Language, Data and Knowledge

pdf
Towards Generation and Recognition of Humorous Texts in Portuguese
Marcio Lima Inácio | Hugo Gonçalo Oliveira
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop

Dealing with humor is an important step to develop Natural Language Processing tools capable of handling sophisticated semantic and pragmatic knowledge. In this context, this PhD thesis focuses on the automatic generation and recognition of verbal punning humor in Portuguese, which is still an underdeveloped language when compared to English. One of the main goals of this research is to conciliate Natural Language Generation computational models with existing theories of humor from the Humanities while avoiding mere generation by including contextual information into the generation process. Another point that is of utmost importance is the inclusion of the listener as an active part in the process of understanding and creating humor; we hope to achieve this by using concepts from Recommender Systems in our methods. Ultimately, we want to not only advance the current state-of-the-art in humor generation and recognition, but also to help the general Portuguese-speaking research community with methods, tools and resources that may aid in the development of further techniques for this language. We also expect our systems to provide insightful ideas about how humor is created and perceived by both humans and machines.

pdf
What do Humor Classifiers Learn? An Attempt to Explain Humor Recognition Models
Marcio Inácio | Gabriela Wick-pedro | Hugo Goncalo Oliveira
Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

Towards computational systems capable of dealing with complex and general linguistic phenomena, it is essential to understand figurative language, which verbal humor is an instance of. This paper reports state-of-the-art results for Humor Recognition in Portuguese, specifically, an F1-score of 99.64% with a BERT-based classifier. However, following the surprising high performance in such a challenging task, we further analyzed what was actually learned by the classifiers. Our main conclusions were that classifiers based on content-features achieve the best performance, but rely mostly on stylistic aspects of the text, not necessarily related to humor, such as punctuation and question words. On the other hand, for humor-related features, we identified some important aspects, such as the presence of named entities, ambiguity and incongruity.

2022

pdf
Movie Rating Prediction using Sentiment Features
João Ramos | Diogo Apóstolo | Hugo Gonçalo Oliveira
Proceedings of the 2nd Workshop on Sentiment Analysis and Linguistic Linked Data

We analyze the impact of using sentiment features in the prediction of movie review scores. The effort included the creation of a new lexicon, Expanded OntoSenticNet (EON), by merging OntoSenticNet and SentiWordNet, and experiments were made on the “IMDB movie review” dataset, with the three main approaches for sentiment analysis: lexicon-based, supervised machine learning and hybrids of the previous. Hybrid approaches performed the best, demonstrating the potential of merging knowledge bases and machine learning, but supervised approaches based on review embeddings were not far.

pdf
A Brief Survey of Textual Dialogue Corpora
Hugo Gonçalo Oliveira | Patrícia Ferreira | Daniel Martins | Catarina Silva | Ana Alves
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Several dialogue corpora are currently available for research purposes, but they still fall short for the growing interest in the development of dialogue systems with their own specific requirements. In order to help those requiring such a corpus, this paper surveys a range of available options, in terms of aspects like speakers, size, languages, collection, annotations, and domains. Some trends are identified and possible approaches for the creation of new corpora are also discussed.

pdf
Exploring Transformers for Ranking Portuguese Semantic Relations
Hugo Gonçalo Oliveira
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We explored transformer-based language models for ranking instances of Portuguese lexico-semantic relations. Weights were based on the likelihood of natural language sequences that transmitted the relation instances, and expectations were that they would be useful for filtering out noisier instances. However, after analysing the weights, no strong conclusions were taken. They are not correlated with redundancy, but are lower for instances with longer and more specific arguments, which may nevertheless be a consequence of their sensitivity to the frequency of such arguments. They did also not reveal to be useful when computing word similarity with network embeddings. Despite the negative results, we see the reported experiments and insights as another contribution for better understanding transformer language models like BERT and GPT, and we make the weighted instances publicly available for further research.

2020

pdf
Widening the Discussion on “False Friends” in Multilingual Wordnets
Hugo Gonçalo Oliveira | Ana Luís
Proceedings of the 2020 Globalex Workshop on Linked Lexicography

There are wordnets in many languages, many aligned with Princeton WordNet, some of which in a (semi-)automatic process, but we rarely see actual discussions on the role of false friends in this process. Having in mind known issues related to such words in language translation, and further motivated by false friend-related issues on the alignment of a Portuguese wordnet with Princeton Wordnet, we aim to widen this discussion, while suggesting preliminary ideas of how wordnets could benefit from this kind of research.

pdf
Corpora and Baselines for Humour Recognition in Portuguese
Hugo Gonçalo Oliveira | André Clemêncio | Ana Alves
Proceedings of the Twelfth Language Resources and Evaluation Conference

Having in mind the lack of work on the automatic recognition of verbal humour in Portuguese, a topic connected with fluency in a natural language, we describe the creation of three corpora, covering two styles of humour and four sources of non-humorous text, that may be used for related studies. We then report on some experiments where the created corpora were used for training and testing computational models that exploit content and linguistic features for humour recognition. The obtained results helped us taking some conclusions about this challenge and may be seen as baselines for those willing to tackle it in the future, using the same corpora.

pdf
AIA-BDE: A Corpus of FAQs in Portuguese and their Variations
Hugo Gonçalo Oliveira | João Ferreira | José Santos | Pedro Fialho | Ricardo Rodrigues | Luisa Coheur | Ana Alves
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present AIA-BDE, a corpus of 380 domain-oriented FAQs in Portuguese and their variations, i.e., paraphrases or entailed questions, created manually, by humans, or automatically, with Google Translate. Its aims to be used as a benchmark for FAQ retrieval and automatic question-answering, but may be useful in other contexts, such as the development of task-oriented dialogue systems, or models for natural language inference in an interrogative context. We also report on two experiments. Matching variations with their original questions was not trivial with a set of unsupervised baselines, especially for manually created variations. Besides high performances obtained with ELMo and BERT embeddings, an Information Retrieval system was surprisingly competitive when considering only the first hit. In the second experiment, text classifiers were trained with the original questions, and tested when assigning each variation to one of three possible sources, or assigning them as out-of-domain. Here, the difference between manual and automatic variations was not so significant.

pdf
Amplifying the Range of News Stories with Creativity: Methods and their Evaluation, in Portuguese
Rui Mendes | Hugo Gonçalo Oliveira
Proceedings of the 13th International Conference on Natural Language Generation

Headlines are key for attracting people to a story, but writing appealing headlines requires time and talent. This work aims to automate the production of creative short texts (e.g., news headlines) for an input context (e.g., existing headlines), thus amplifying its range. Well-known expressions (e.g., proverbs, movie titles), which typically include word-play and resort to figurative language, are used as a starting point. Given an input text, they can be recommended by exploiting Semantic Textual Similarity (STS) techniques, or adapted towards higher relatedness. For the latter, three methods that exploit static word embeddings are proposed. Experimentation in Portuguese lead to some conclusions, based on human opinions: STS methods that look exclusively at the surface text, recommend more related expressions; resulting expressions are somewhat related to the input, but adaptation leads to higher relatedness and novelty; humour can be an indirect consequence, but most outputs are not funny.

2019

pdf
Fast developing of a Natural Language Interface for a Portuguese WordNet: Leveraging on Sentence Embeddings
Hugo Gonçalo Oliveira | Alexandre Rademaker
Proceedings of the 10th Global Wordnet Conference

We describe how a natural language interface can be developed for a wordnet with a small set of handcrafted templates, leveraging on sentence embeddings. The proposed approach does not use rules for parsing natural language queries but experiments showed that the embeddings model is tolerant enough for correctly predicting relation types that do not match known patterns exactly. It was tested with OpenWordNet-PT, for which this method may provide an alternative interface, with benefits also on the curation process.

pdf
Contributions to Clinical Named Entity Recognition in Portuguese
Fábio Lopes | César Teixeira | Hugo Gonçalo Oliveira
Proceedings of the 18th BioNLP Workshop and Shared Task

Having in mind that different languages might present different challenges, this paper presents the following contributions to the area of Information Extraction from clinical text, targeting the Portuguese language: a collection of 281 clinical texts in this language, with manually-annotated named entities; word embeddings trained in a larger collection of similar texts; results of using BiLSTM-CRF neural networks for named entity recognition on the annotated collection, including a comparison of using in-domain or out-of-domain word embeddings in this task. Although learned with much less data, performance is higher when using in-domain embeddings. When tested in 20 independent clinical texts, this model achieved better results than a model using larger out-of-domain embeddings.

2018

bib
Proceedings of the 3rd Workshop on Computational Creativity in Natural Language Generation (CC-NLG 2018)
Hugo Gonçalo Oliveira | Ben Burtenshaw | Raquel Hervás
Proceedings of the 3rd Workshop on Computational Creativity in Natural Language Generation (CC-NLG 2018)

pdf bib
Seeking the Ideal Narrative Model for Computer-Generated Narratives
Mariana Ferreira | Hugo Gonçalo Oliveira
Proceedings of the 3rd Workshop on Computational Creativity in Natural Language Generation (CC-NLG 2018)

pdf
Exploring Lexical-Semantic Knowledge in the Generation of Novel Riddles in Portuguese
Hugo Gonçalo Oliveira | Ricardo Rodrigues
Proceedings of the 3rd Workshop on Computational Creativity in Natural Language Generation (CC-NLG 2018)

2017

pdf bib
A Survey on Intelligent Poetry Generation: Languages, Features, Techniques, Reutilisation and Evaluation
Hugo Gonçalo Oliveira
Proceedings of the 10th International Conference on Natural Language Generation

Poetry generation is becoming popular among researchers of Natural Language Generation, Computational Creativity and, broadly, Artificial Intelligence. To produce text that may be regarded as poetry, poetry generation systems are typically knowledge-intensive and have to deal with several levels of language, from lexical to semantics. Interest on the topic resulted in the development of several poetry generators described in the literature, with different features covered or handled differently, by a broad range of alternative approaches, as well as different perspectives on evaluation, another challenging aspect due the underlying subjectivity. This paper surveys intelligent poetry generators around a set of relevant axis for poetry generation – targeted languages, form and content features, techniques, reutilisation of material, and evaluation – and aims to organise work developed on this topic so far.

pdf
Co-PoeTryMe: a Co-Creative Interface for the Composition of Poetry
Hugo Gonçalo Oliveira | Tiago Mendes | Ana Boavida
Proceedings of the 10th International Conference on Natural Language Generation

Co-PoeTryMe is a web application for poetry composition, guided by the user, though with the help of automatic features, such as the generation of full (editable) drafts, as well as the acquisition of additional well-formed lines, or semantically-related words, possibly constrained by the number of syllables, rhyme, or polarity. Towards the final poem, the latter can replace lines or words in the draft.

pdf bib
Proceedings of the Workshop on Computational Creativity in Natural Language Generation (CC-NLG 2017)
Hugo Gonçalo Oliveira | Ben Burtenshaw | Mike Kestemont | Tom De Smedt
Proceedings of the Workshop on Computational Creativity in Natural Language Generation (CC-NLG 2017)

pdf bib
O Poeta Artificial 2.0: Increasing Meaningfulness in a Poetry Generation Twitter bot
Hugo Gonçalo Oliveira
Proceedings of the Workshop on Computational Creativity in Natural Language Generation (CC-NLG 2017)

2016

pdf
TweetMT: A Parallel Microblog Corpus
Iñaki San Vicente | Iñaki Alegría | Cristina España-Bonet | Pablo Gamallo | Hugo Gonçalo Oliveira | Eva Martínez Garcia | Antonio Toral | Arkaitz Zubiaga | Nora Aranberri
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We introduce TweetMT, a parallel corpus of tweets in four language pairs that combine five languages (Spanish from/to Basque, Catalan, Galician and Portuguese), all of which have an official status in the Iberian Peninsula. The corpus has been created by combining automatic collection and crowdsourcing approaches, and it is publicly available. It is intended for the development and testing of microtext machine translation systems. In this paper we describe the methodology followed to build the corpus, and present the results of the shared task in which it was tested.

pdf
Can Topic Modelling benefit from Word Sense Information?
Adriana Ferrugento | Hugo Gonçalo Oliveira | Ana Alves | Filipe Rodrigues
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper proposes a new topic model that exploits word sense information in order to discover less redundant and more informative topics. Word sense information is obtained from WordNet and the discovered topics are groups of synsets, instead of mere surface words. A key feature is that all the known senses of a word are considered, with their probabilities. Alternative configurations of the model are described and compared to each other and to LDA, the most popular topic model. However, the obtained results suggest that there are no benefits of enriching LDA with word sense information.

pdf
Discovering Fuzzy Synsets from the Redundancy in Different Lexical-Semantic Resources
Hugo Gonçalo Oliveira | Fábio Santos
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Although represented as such in wordnets, word senses are not discrete. To handle word senses as fuzzy objects, we exploit the graph structure of synonymy pairs acquired from different sources to discover synsets where words have different membership degrees that reflect confidence. Following this approach, a wide-coverage fuzzy thesaurus was discovered from a synonymy network compiled from seven Portuguese lexical-semantic resources. Based on a crowdsourcing evaluation, we can say that the quality of the obtained synsets is far from perfect but, as expected in a confidence measure, it increases significantly for higher cut-points on the membership and, at a certain point, reaches 100% correction rate.

pdf
An overview of Portuguese WordNets
Valeria de Paiva | Livy Real | Hugo Gonçalo Oliveira | Alexandre Rademaker | Cláudia Freitas | Alberto Simões
Proceedings of the 8th Global WordNet Conference (GWC)

Semantic relations between words are key to building systems that aim to understand and manipulate language. For English, the “de facto” standard for representing this kind of knowledge is Princeton’s WordNet. Here, we describe the wordnet-like resources currently available for Portuguese: their origins, methods of creation, sizes, and usage restrictions. We start tackling the problem of comparing them, but only in quantitative terms. Finally, we sketch ideas for potential collaboration between some of the projects that produce Portuguese wordnets.

2015

pdf
ASAP-II: From the Alignment of Phrases to Textual Similarity
Ana Alves | David Simões | Hugo Gonçalo Oliveira | Adriana Ferrugento
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf
CISUC-KIS: Tackling Message Polarity Classification with a Large and Diverse Set of Features
João Leal | Sara Pinto | Ana Bento | Hugo Gonçalo Oliveira | Paulo Gomes
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf
Exploiting Portuguese Lexical Knowledge Bases for Answering Open Domain Cloze Questions Automatically
Hugo Gonçalo Oliveira | Inês Coelho | Paulo Gomes
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present the task of answering cloze questions automatically and how it can be tackled by exploiting lexical knowledge bases (LKBs). This task was performed in what can be seen as an indirect evaluation of Portuguese LKB. We introduce the LKBs used and the algorithms applied, and then report on the obtained results and draw some conclusions: LKBs are definitely useful resources for this challenging task, and exploiting them, especially with PageRanking-based algorithms, clearly improves the baselines. Moreover, larger LKB, created automatically and not sense-aware led to the best results, as opposed to handcrafted LKB structured on synsets.

pdf
Onto.PT: recent developments of a large public domain Portuguese wordnet
Hugo Gonçalo Oliveira | Paulo Gomes
Proceedings of the Seventh Global Wordnet Conference

2012

pdf
Folheador: browsing through Portuguese semantic relations
Hugo Gonçalo Oliveira | Hernani Costa | Diana Santos
Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics

2010

pdf
Second HAREM: Advancing the State of the Art of Named Entity Recognition in Portuguese
Cláudia Freitas | Cristina Mota | Diana Santos | Hugo Gonçalo Oliveira | Paula Carvalho
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we present Second HAREM, the second edition of an evaluation campaign for Portuguese, addressing named entity recognition (NER). This second edition also included two new tracks: the recognition and normalization of temporal entities (proposed by a group of participants, and hence not covered on this paper) and ReRelEM, the detection of semantic relations between named entities. We summarize the setup of Second HAREM by showing the preserved distinctive features and discussing the changes compared to the first edition. Furthermore, we present the main results achieved and describe the available resources and tools developed under this evaluation, namely,(i) the golden collections, i.e. a set of documents whose named entities and semantic relations between those entities were manually annotated, (ii) the Second HAREM collection (which contains the unannotated version of the golden collection), as well as the participating systems results on it, (iii) the scoring tools, and (iv) SAHARA, a Web application that allows interactive evaluation. We end the paper by offering some remarks about what was learned.

pdf bib
Towards the Automatic Creation of a Wordnet from a Term-Based Lexical Network
Hugo Gonçalo Oliveira | Paulo Gomes
Proceedings of TextGraphs-5 - 2010 Workshop on Graph-based Methods for Natural Language Processing

2009

pdf
Relation detection between named entities: report of a shared task
Cláudia Freitas | Diana Santos | Cristina Mota | Hugo Gonçalo Oliveira | Paula Carvalho
Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009)