Hugo Gonçalo Oliveira

Also published as: Hugo Gonçalo Oliveira


2020

pdf bib
Widening the Discussion on “False Friends” in Multilingual Wordnets
Hugo Gonçalo Oliveira | Ana Luís
Proceedings of the 2020 Globalex Workshop on Linked Lexicography

There are wordnets in many languages, many aligned with Princeton WordNet, some of which in a (semi-)automatic process, but we rarely see actual discussions on the role of false friends in this process. Having in mind known issues related to such words in language translation, and further motivated by false friend-related issues on the alignment of a Portuguese wordnet with Princeton Wordnet, we aim to widen this discussion, while suggesting preliminary ideas of how wordnets could benefit from this kind of research.

pdf bib
Amplifying the Range of News Stories with Creativity: Methods and their Evaluation, in Portuguese
Rui Mendes | Hugo Gonçalo Oliveira
Proceedings of the 13th International Conference on Natural Language Generation

Headlines are key for attracting people to a story, but writing appealing headlines requires time and talent. This work aims to automate the production of creative short texts (e.g., news headlines) for an input context (e.g., existing headlines), thus amplifying its range. Well-known expressions (e.g., proverbs, movie titles), which typically include word-play and resort to figurative language, are used as a starting point. Given an input text, they can be recommended by exploiting Semantic Textual Similarity (STS) techniques, or adapted towards higher relatedness. For the latter, three methods that exploit static word embeddings are proposed. Experimentation in Portuguese lead to some conclusions, based on human opinions: STS methods that look exclusively at the surface text, recommend more related expressions; resulting expressions are somewhat related to the input, but adaptation leads to higher relatedness and novelty; humour can be an indirect consequence, but most outputs are not funny.

pdf bib
Corpora and Baselines for Humour Recognition in Portuguese
Hugo Gonçalo Oliveira | André Clemêncio | Ana Alves
Proceedings of the 12th Language Resources and Evaluation Conference

Having in mind the lack of work on the automatic recognition of verbal humour in Portuguese, a topic connected with fluency in a natural language, we describe the creation of three corpora, covering two styles of humour and four sources of non-humorous text, that may be used for related studies. We then report on some experiments where the created corpora were used for training and testing computational models that exploit content and linguistic features for humour recognition. The obtained results helped us taking some conclusions about this challenge and may be seen as baselines for those willing to tackle it in the future, using the same corpora.

pdf bib
AIA-BDE: A Corpus of FAQs in Portuguese and their Variations
Hugo Gonçalo Oliveira | João Ferreira | José Santos | Pedro Fialho | Ricardo Rodrigues | Luisa Coheur | Ana Alves
Proceedings of the 12th Language Resources and Evaluation Conference

We present AIA-BDE, a corpus of 380 domain-oriented FAQs in Portuguese and their variations, i.e., paraphrases or entailed questions, created manually, by humans, or automatically, with Google Translate. Its aims to be used as a benchmark for FAQ retrieval and automatic question-answering, but may be useful in other contexts, such as the development of task-oriented dialogue systems, or models for natural language inference in an interrogative context. We also report on two experiments. Matching variations with their original questions was not trivial with a set of unsupervised baselines, especially for manually created variations. Besides high performances obtained with ELMo and BERT embeddings, an Information Retrieval system was surprisingly competitive when considering only the first hit. In the second experiment, text classifiers were trained with the original questions, and tested when assigning each variation to one of three possible sources, or assigning them as out-of-domain. Here, the difference between manual and automatic variations was not so significant.

2019

pdf bib
Fast developing of a Natural Language Interface for a Portuguese WordNet: Leveraging on Sentence Embeddings
Hugo Gonçalo Oliveira | Alexandre Rademaker
Proceedings of the 10th Global Wordnet Conference

We describe how a natural language interface can be developed for a wordnet with a small set of handcrafted templates, leveraging on sentence embeddings. The proposed approach does not use rules for parsing natural language queries but experiments showed that the embeddings model is tolerant enough for correctly predicting relation types that do not match known patterns exactly. It was tested with OpenWordNet-PT, for which this method may provide an alternative interface, with benefits also on the curation process.

pdf bib
Contributions to Clinical Named Entity Recognition in Portuguese
Fábio Lopes | César Teixeira | Hugo Gonçalo Oliveira
Proceedings of the 18th BioNLP Workshop and Shared Task

Having in mind that different languages might present different challenges, this paper presents the following contributions to the area of Information Extraction from clinical text, targeting the Portuguese language: a collection of 281 clinical texts in this language, with manually-annotated named entities; word embeddings trained in a larger collection of similar texts; results of using BiLSTM-CRF neural networks for named entity recognition on the annotated collection, including a comparison of using in-domain or out-of-domain word embeddings in this task. Although learned with much less data, performance is higher when using in-domain embeddings. When tested in 20 independent clinical texts, this model achieved better results than a model using larger out-of-domain embeddings.

2018

bib
Proceedings of the 3rd Workshop on Computational Creativity in Natural Language Generation (CC-NLG 2018)
Hugo Gonçalo Oliveira | Ben Burtenshaw | Raquel Hervás
Proceedings of the 3rd Workshop on Computational Creativity in Natural Language Generation (CC-NLG 2018)

pdf bib
Seeking the Ideal Narrative Model for Computer-Generated Narratives
Mariana Ferreira | Hugo Gonçalo Oliveira
Proceedings of the 3rd Workshop on Computational Creativity in Natural Language Generation (CC-NLG 2018)

pdf bib
Exploring Lexical-Semantic Knowledge in the Generation of Novel Riddles in Portuguese
Hugo Gonçalo Oliveira | Ricardo Rodrigues
Proceedings of the 3rd Workshop on Computational Creativity in Natural Language Generation (CC-NLG 2018)

2017

pdf bib
A Survey on Intelligent Poetry Generation: Languages, Features, Techniques, Reutilisation and Evaluation
Hugo Gonçalo Oliveira
Proceedings of the 10th International Conference on Natural Language Generation

Poetry generation is becoming popular among researchers of Natural Language Generation, Computational Creativity and, broadly, Artificial Intelligence. To produce text that may be regarded as poetry, poetry generation systems are typically knowledge-intensive and have to deal with several levels of language, from lexical to semantics. Interest on the topic resulted in the development of several poetry generators described in the literature, with different features covered or handled differently, by a broad range of alternative approaches, as well as different perspectives on evaluation, another challenging aspect due the underlying subjectivity. This paper surveys intelligent poetry generators around a set of relevant axis for poetry generation – targeted languages, form and content features, techniques, reutilisation of material, and evaluation – and aims to organise work developed on this topic so far.

pdf bib
Co-PoeTryMe: a Co-Creative Interface for the Composition of Poetry
Hugo Gonçalo Oliveira | Tiago Mendes | Ana Boavida
Proceedings of the 10th International Conference on Natural Language Generation

Co-PoeTryMe is a web application for poetry composition, guided by the user, though with the help of automatic features, such as the generation of full (editable) drafts, as well as the acquisition of additional well-formed lines, or semantically-related words, possibly constrained by the number of syllables, rhyme, or polarity. Towards the final poem, the latter can replace lines or words in the draft.

pdf bib
Proceedings of the Workshop on Computational Creativity in Natural Language Generation (CC-NLG 2017)
Hugo Gonçalo Oliveira | Ben Burtenshaw | Mike Kestemont | Tom De Smedt
Proceedings of the Workshop on Computational Creativity in Natural Language Generation (CC-NLG 2017)

pdf bib
O Poeta Artificial 2.0: Increasing Meaningfulness in a Poetry Generation Twitter bot
Hugo Gonçalo Oliveira
Proceedings of the Workshop on Computational Creativity in Natural Language Generation (CC-NLG 2017)

2016

pdf bib
An overview of Portuguese WordNets
Valeria de Paiva | Livy Real | Hugo Gonçalo Oliveira | Alexandre Rademaker | Cláudia Freitas | Alberto Simões
Proceedings of the 8th Global WordNet Conference (GWC)

Semantic relations between words are key to building systems that aim to understand and manipulate language. For English, the “de facto” standard for representing this kind of knowledge is Princeton’s WordNet. Here, we describe the wordnet-like resources currently available for Portuguese: their origins, methods of creation, sizes, and usage restrictions. We start tackling the problem of comparing them, but only in quantitative terms. Finally, we sketch ideas for potential collaboration between some of the projects that produce Portuguese wordnets.

pdf bib
TweetMT: A Parallel Microblog Corpus
Iñaki San Vicente | Iñaki Alegría | Cristina España-Bonet | Pablo Gamallo | Hugo Gonçalo Oliveira | Eva Martínez Garcia | Antonio Toral | Arkaitz Zubiaga | Nora Aranberri
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We introduce TweetMT, a parallel corpus of tweets in four language pairs that combine five languages (Spanish from/to Basque, Catalan, Galician and Portuguese), all of which have an official status in the Iberian Peninsula. The corpus has been created by combining automatic collection and crowdsourcing approaches, and it is publicly available. It is intended for the development and testing of microtext machine translation systems. In this paper we describe the methodology followed to build the corpus, and present the results of the shared task in which it was tested.

pdf bib
Can Topic Modelling benefit from Word Sense Information?
Adriana Ferrugento | Hugo Gonçalo Oliveira | Ana Alves | Filipe Rodrigues
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper proposes a new topic model that exploits word sense information in order to discover less redundant and more informative topics. Word sense information is obtained from WordNet and the discovered topics are groups of synsets, instead of mere surface words. A key feature is that all the known senses of a word are considered, with their probabilities. Alternative configurations of the model are described and compared to each other and to LDA, the most popular topic model. However, the obtained results suggest that there are no benefits of enriching LDA with word sense information.

pdf bib
Discovering Fuzzy Synsets from the Redundancy in Different Lexical-Semantic Resources
Hugo Gonçalo Oliveira | Fábio Santos
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Although represented as such in wordnets, word senses are not discrete. To handle word senses as fuzzy objects, we exploit the graph structure of synonymy pairs acquired from different sources to discover synsets where words have different membership degrees that reflect confidence. Following this approach, a wide-coverage fuzzy thesaurus was discovered from a synonymy network compiled from seven Portuguese lexical-semantic resources. Based on a crowdsourcing evaluation, we can say that the quality of the obtained synsets is far from perfect but, as expected in a confidence measure, it increases significantly for higher cut-points on the membership and, at a certain point, reaches 100% correction rate.

2015

pdf bib
ASAP-II: From the Alignment of Phrases to Textual Similarity
Ana Alves | David Simões | Hugo Gonçalo Oliveira | Adriana Ferrugento
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf bib
Exploiting Portuguese Lexical Knowledge Bases for Answering Open Domain Cloze Questions Automatically
Hugo Gonçalo Oliveira | Inês Coelho | Paulo Gomes
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present the task of answering cloze questions automatically and how it can be tackled by exploiting lexical knowledge bases (LKBs). This task was performed in what can be seen as an indirect evaluation of Portuguese LKB. We introduce the LKBs used and the algorithms applied, and then report on the obtained results and draw some conclusions: LKBs are definitely useful resources for this challenging task, and exploiting them, especially with PageRanking-based algorithms, clearly improves the baselines. Moreover, larger LKB, created automatically and not sense-aware led to the best results, as opposed to handcrafted LKB structured on synsets.

pdf bib
Onto.PT: recent developments of a large public domain Portuguese wordnet
Hugo Gonçalo Oliveira | Paulo Gomes
Proceedings of the Seventh Global Wordnet Conference

pdf bib
CISUC-KIS: Tackling Message Polarity Classification with a Large and Diverse Set of Features
João Leal | Sara Pinto | Ana Bento | Hugo Gonçalo Oliveira | Paulo Gomes
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

2012

pdf bib
Folheador: browsing through Portuguese semantic relations
Hugo Gonçalo Oliveira | Hernani Costa | Diana Santos
Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics

2010

pdf bib
Second HAREM: Advancing the State of the Art of Named Entity Recognition in Portuguese
Cláudia Freitas | Cristina Mota | Diana Santos | Hugo Gonçalo Oliveira | Paula Carvalho
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we present Second HAREM, the second edition of an evaluation campaign for Portuguese, addressing named entity recognition (NER). This second edition also included two new tracks: the recognition and normalization of temporal entities (proposed by a group of participants, and hence not covered on this paper) and ReRelEM, the detection of semantic relations between named entities. We summarize the setup of Second HAREM by showing the preserved distinctive features and discussing the changes compared to the first edition. Furthermore, we present the main results achieved and describe the available resources and tools developed under this evaluation, namely,(i) the golden collections, i.e. a set of documents whose named entities and semantic relations between those entities were manually annotated, (ii) the Second HAREM collection (which contains the unannotated version of the golden collection), as well as the participating systems results on it, (iii) the scoring tools, and (iv) SAHARA, a Web application that allows interactive evaluation. We end the paper by offering some remarks about what was learned.

pdf bib
Towards the Automatic Creation of a Wordnet from a Term-Based Lexical Network
Hugo Gonçalo Oliveira | Paulo Gomes
Proceedings of TextGraphs-5 - 2010 Workshop on Graph-based Methods for Natural Language Processing

2009

pdf bib
Relation detection between named entities: report of a shared task
Cláudia Freitas | Diana Santos | Cristina Mota | Hugo Gonçalo Oliveira | Paula Carvalho
Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009)