Paulo Santos

2022

pdf abs
A Semantics of Spatial Expressions for interacting with unmanned aerial vehicles
Lucas Domingos | Paulo Santos
Proceedings of the The 20th Annual Workshop of the Australasian Language Technology Association

This paper describes an investigation of establishing communication between a quadro- tor and a human by means of qualitative spatial relations using speech recognition. It is based on a system capable to receive, interpret, process, act, transmit and execute the commands given. This system is composed of a quadrotor equipped with a GPS, IMU sensors and radio communication, and a computer acting as a ground station, that is capable of understanding and interpreting the received commands and correctly provide answers according to an underlying qualitative reasoning formalism. Tests were performed, whose results show that the error rate was less than five percent for vertical and radial dimensions, otherwise, in horizontal dimension, we had an error rate of almost ten percent.

pdf abs
Contrastive Visual and Language Learning for Visual Relationship Detection
Thanh Tran | Maelic Neau | Paulo Santos | David Powers
Proceedings of the The 20th Annual Workshop of the Australasian Language Technology Association

Visual Relationship Detection aims to understand real-world objects’ interactions by grounding visual concepts to compositional visual relation triples, written in the form of (subject, predicate, object). Previous works have explored the use of contrastive learning to implicitly predict the predicates from the relevant image regions. However, these models often directly leverage in-distribution spatial and language co-occurrences biases during training, preventing the models from generalizing to out-of-distribution compositions. In this work, we examine whether contrastive vision and language models pre-trained on large-scale external image and text dataset can assist the detection of compositional visual relationships. To this end, we propose a semi-supervised contrastive fine-tuning approach for the visual relationship detection task. The results show that fine-tuned models that were pre-trained on larger datasets do not yield better performance when performing visual relationship detection, and larger models can yield lower performance when compared with their smaller counterparts.

Co-authors

Venues

alta2