Riza Theresa Batista-Navarro

Also published as: Riza Batista-Navarro, Riza Theresa Batista-Navarro


RaFoLa: A Rationale-Annotated Corpus for Detecting Indicators of Forced Labour
Erick Mendez Guzman | Viktor Schlegel | Riza Batista-Navarro
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Forced labour is the most common type of modern slavery, and it is increasingly gaining the attention of the research and social community. Recent studies suggest that artificial intelligence (AI) holds immense potential for augmenting anti-slavery action. However, AI tools need to be developed transparently in cooperation with different stakeholders. Such tools are contingent on the availability and access to domain-specific data, which are scarce due to the near-invisible nature of forced labour. To the best of our knowledge, this paper presents the first openly accessible English corpus annotated for multi-class and multi-label forced labour detection. The corpus consists of 989 news articles retrieved from specialised data sources and annotated according to risk indicators defined by the International Labour Organization (ILO). Each news article was annotated for two aspects: (1) indicators of forced labour as classification labels and (2) snippets of the text that justify labelling decisions. We hope that our data set can help promote research on explainability for multi-class and multi-label text classification. In this work, we explain our process for collecting the data underpinning the proposed corpus, describe our annotation guidelines and present some statistical analysis of its content. Finally, we summarise the results of baseline experiments based on different variants of the Bidirectional Encoder Representation from Transformer (BERT) model.

Incorporating Zoning Information into Argument Mining from Biomedical Literature
Boyang Liu | Viktor Schlegel | Riza Batista-Navarro | Sophia Ananiadou
Proceedings of the Thirteenth Language Resources and Evaluation Conference

The goal of text zoning is to segment a text into zones (i.e., Background, Conclusion) that serve distinct functions. Argumentative zoning, a specific text zoning scheme for the scientific domain, is considered as the antecedent for argument mining by many researchers. Surprisingly, however, little work is concerned with exploiting zoning information to improve the performance of argument mining models, despite the relatedness of the two tasks. In this paper, we propose two transformer-based models to incorporate zoning information into argumentative component identification and classification tasks. One model is for the sentence-level argument mining task and the other is for the token-level task. In particular, we add the zoning labels predicted by an off-the-shelf model to the beginning of each sentence, inspired by the convention commonly used biomedical abstracts. Moreover, we employ multi-head attention to transfer the sentence-level zoning information to each token in a sentence. Based on experiment results, we find a significant improvement in F1-scores for both sentence- and token-level tasks. It is worth mentioning that these zoning labels can be obtained with high accuracy by utilising readily available automated methods. Thus, existing argument mining models can be improved by incorporating zoning information without any additional annotation cost.

Building an Ensemble of Transformer Models for Arabic Dialect Classification and Sentiment Analysis
Abdullah Salem Khered | Ingy Yasser Hassan Abdou Abdelhalim | Riza Batista-Navarro
Proceedings of the The Seventh Arabic Natural Language Processing Workshop (WANLP)

In this paper, we describe the approaches we developed for the Nuanced Arabic Dialect Identification (NADI) 2022 shared task, which consists of two subtasks: the identification of country-level Arabic dialects and sentiment analysis. Our team, UniManc, developed approaches to the two subtasks which are underpinned by the same model: a pre-trained MARBERT language model. For Subtask 1, we applied undersampling to create versions of the training data with a balanced distribution across classes. For Subtask 2, we further trained the original MARBERT model for the masked language modelling objective using a NADI-provided dataset of unlabelled Arabic tweets. For each of the subtasks, a MARBERT model was fine-tuned for sequence classification, using different values for hyperparameters such as seed and learning rate. This resulted in multiple model variants, which formed the basis of an ensemble model for each subtask. Based on the official NADI evaluation, our ensemble model obtained a macro-F1-score of 26.863, ranking second overall in the first subtask. In the second subtask, our ensemble model also ranked second, obtaining a macro-F1-PN score (macro-averaged F1-score over the Positive and Negative classes) of 73.544.

Food for Thought: How can we exploit contextual embeddings in the translation of idiomatic expressions?
Lukas Santing | Ryan Jean-Luc Sijstermans | Giacomo Anerdi | Pedro Jeuris | Marijn ten Thij | Riza Batista-Navarro
Proceedings of the 3rd Workshop on Figurative Language Processing (FLP)

Idiomatic expressions (or idioms) are phrases where the meaning of the phrase cannot be determined from the meaning of the individual words in the expression. Translating idioms between languages is therefore a challenging task. Transformer models based on contextual embeddings have advanced the state-of-the-art across many domains in the field of natural language processing. While research using transformers has advanced both idiom detection as well as idiom disambiguation, idiom translation has not seen a similar advancement. In this work, we investigate two approaches to fine-tuning a pretrained Text-to-Text Transfer Transformer (T5) model to perform idiom translation from English to German. The first approach directly translates English idiom-containing sentences to German, while the second is underpinned by idiom paraphrasing, firstly paraphrasing English idiomatic expressions to their simplified English versions before translating them to German. Results of our evaluation show that each of the approaches is able to generate adequate translations.

Policy-focused Stance Detection in Parliamentary Debate Speeches
Gavin Abercrombie | Riza Batista-Navarro
Northern European Journal of Language Technology, Volume 8

Legislative debate transcripts provide citizens with information about the activities of their elected representatives, but are difficult for people to process. We propose the novel task of policy-focused stance detection, in which both the policy proposals under debate and the position of the speakers towards those proposals are identified. We adapt a previously existing dataset to include manual annotations of policy preferences, an established schema from political science. We evaluate a range of approaches to the automatic classification of policy preferences and speech sentiment polarity, including transformer-based text representations and a multi-task learning paradigm. We find that it is possible to identify the policies under discussion using features derived from the speeches, and that incorporating motion-dependent debate modelling, previously used to classify speech sentiment, also improves performance in the classification of policy preferences. We analyse the output of the best performing system, finding that discriminating features for the task are highly domain-specific, and that speeches that address policy preferences proposed by members of the same party can be among the most difficult to predict.


Is the Understanding of Explicit Discourse Relations Required in Machine Reading Comprehension?
Yulong Wu | Viktor Schlegel | Riza Batista-Navarro
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

An in-depth analysis of the level of language understanding required by existing Machine Reading Comprehension (MRC) benchmarks can provide insight into the reading capabilities of machines. In this paper, we propose an ablation-based methodology to assess the extent to which MRC datasets evaluate the understanding of explicit discourse relations. We define seven MRC skills which require the understanding of different discourse relations. We then introduce ablation methods that verify whether these skills are required to succeed on a dataset. By observing the drop in performance of neural MRC models evaluated on the original and the modified dataset, we can measure to what degree the dataset requires these skills, in order to be understood correctly. Experiments on three large-scale datasets with the BERT-base and ALBERT-xxlarge model show that the relative changes for all skills are small (less than 6%). These results imply that most of the answered questions in the examined datasets do not require understanding the discourse structure of the text. To specifically probe for natural language understanding, there is a need to design more challenging benchmarks that can correctly evaluate the intended skills.


ParlVote: A Corpus for Sentiment Analysis of Political Debates
Gavin Abercrombie | Riza Batista-Navarro
Proceedings of the Twelfth Language Resources and Evaluation Conference

Debate transcripts from the UK Parliament contain information about the positions taken by politicians towards important topics, but are difficult for people to process manually. While sentiment analysis of debate speeches could facilitate understanding of the speakers’ stated opinions, datasets currently available for this task are small when compared to the benchmark corpora in other domains. We present ParlVote, a new, larger corpus of parliamentary debate speeches for use in the evaluation of sentiment analysis systems for the political domain. We also perform a number of initial experiments on this dataset, testing a variety of approaches to the classification of sentiment polarity in debate speeches. These include a linear classifier as well as a neural network trained using a transformer word embedding model (BERT), and fine-tuned on the parliamentary speeches. We find that in many scenarios, a linear classifier trained on a bag-of-words text representation achieves the best results. However, with the largest dataset, the transformer-based model combined with a neural classifier provides the best performance. We suggest that further experimentation with classification models and observations of the debate content and structure are required, and that there remains much room for improvement in parliamentary sentiment analysis.

A Framework for Evaluation of Machine Reading Comprehension Gold Standards
Viktor Schlegel | Marco Valentino | Andre Freitas | Goran Nenadic | Riza Batista-Navarro
Proceedings of the Twelfth Language Resources and Evaluation Conference

Machine Reading Comprehension (MRC) is the task of answering a question over a paragraph of text. While neural MRC systems gain popularity and achieve noticeable performance, issues are being raised with the methodology used to establish their performance, particularly concerning the data design of gold standards that are used to evaluate them. There is but a limited understanding of the challenges present in this data, which makes it hard to draw comparisons and formulate reliable hypotheses. As a first step towards alleviating the problem, this paper proposes a unifying framework to systematically investigate the present linguistic features, required reasoning and background knowledge and factual correctness on one hand, and the presence of lexical cues as a lower bound for the requirement of understanding on the other hand. We propose a qualitative annotation schema for the first and a set of approximative metrics for the latter. In a first application of the framework, we analyse modern MRC gold standards and present our findings: the absence of features that contribute towards lexical ambiguity, the varying factual correctness of the expected answers and the presence of lexical cues, all of which potentially lower the reading comprehension complexity and quality of the evaluation data.


Policy Preference Detection in Parliamentary Debate Motions
Gavin Abercrombie | Federico Nanni | Riza Batista-Navarro | Simone Paolo Ponzetto
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Debate motions (proposals) tabled in the UK Parliament contain information about the stated policy preferences of the Members of Parliament who propose them, and are key to the analysis of all subsequent speeches given in response to them. We attempt to automatically label debate motions with codes from a pre-existing coding scheme developed by political scientists for the annotation and analysis of political parties’ manifestos. We develop annotation guidelines for the task of applying these codes to debate motions at two levels of granularity and produce a dataset of manually labelled examples. We evaluate the annotation process and the reliability and utility of the labelling scheme, finding that inter-annotator agreement is comparable with that of other studies conducted on manifesto data. Moreover, we test a variety of ways of automatically labelling motions with the codes, ranging from similarity matching to neural classification methods, and evaluate them against the gold standard labels. From these experiments, we note that established supervised baselines are not always able to improve over simple lexical heuristics. At the same time, we detect a clear and evident benefit when employing BERT, a state-of-the-art deep language representation model, even in classification scenarios with over 30 different labels and limited amounts of training data.

Semantic Frame Embeddings for Detecting Relations between Software Requirements
Waad Alhoshan | Riza Batista-Navarro | Liping Zhao
Proceedings of the 13th International Conference on Computational Semantics - Student Papers

The early phases of requirements engineering (RE) deal with a vast amount of software requirements (i.e., requirements that define characteristics of software systems), which are typically expressed in natural language. Analysing such unstructured requirements, usually obtained from users’ inputs, is considered a challenging task due to the inherent ambiguity and inconsistency of natural language. To support such a task, methods based on natural language processing (NLP) can be employed. One of the more recent advances in NLP is the use of word embeddings for capturing contextual information, which can then be applied in word analogy tasks. In this paper, we describe a new resource, i.e., embedding-based representations of semantic frames in FrameNet, which was developed to support the detection of relations between software requirements. Our embeddings, which encapsulate contextual information at the semantic frame level, were trained on a large corpus of requirements (i.e., a collection of more than three million mobile application reviews). The similarity between these frame embeddings is then used as a basis for detecting semantic relatedness between software requirements. Compared with existing resources underpinned by word-level embeddings alone, and frame embeddings built upon pre-trained vectors, our proposed frame embeddings obtained better performance against judgements of an RE expert. These encouraging results demonstrate the strong potential of the resource in supporting RE analysis tasks (e.g., traceability), which we plan to investigate as part of our future work.

Semantic Change in the Language of UK Parliamentary Debates
Gavin Abercrombie | Riza Batista-Navarro
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change

We investigate changes in the meanings of words used in the UK Parliament across two different epochs. We use word embeddings to explore changes in the distribution of words of interest and uncover words that appear to have undergone semantic transformation in the intervening period, and explore different ways of obtaining target words for this purpose. We find that semantic changes are generally in line with those found in other corpora, and little evidence that parliamentary language is more static than general English. It also seems that words with senses that have been recorded in the dictionary as having fallen into disuse do not undergo semantic changes in this domain.

Understanding the Evolution of Circular Economy through Language Change
Sampriti Mahanty | Frank Boons | Julia Handl | Riza Theresa Batista-Navarro
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change

In this study, we propose to focus on understanding the evolution of a specific scientific concept—that of Circular Economy (CE)—by analysing how the language used in academic discussions has changed semantically. It is worth noting that the meaning and central theme of this concept has remained the same; however, we hypothesise that it has undergone semantic change by way of additional layers being added to the concept. We have shown that semantic change in language is a reflection of shifts in scientific ideas, which in turn help explain the evolution of a concept. Focusing on the CE concept, our analysis demonstrated that the change over time in the language used in academic discussions of CE is indicative of the way in which the concept evolved and expanded.


‘Aye’ or ‘No’? Speech-level Sentiment Analysis of Hansard UK Parliamentary Debate Transcripts
Gavin Abercrombie | Riza Batista-Navarro
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Identifying Opinion-Topics and Polarity of Parliamentary Debate Motions
Gavin Abercrombie | Riza Theresa Batista-Navarro
Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Analysis of the topics mentioned and opinions expressed in parliamentary debate motions–or proposals–is difficult for human readers, but necessary for understanding and automatic processing of the content of the subsequent speeches. We present a dataset of debate motions with pre-existing ‘policy’ labels, and investigate the utility of these labels for simultaneous topic and opinion polarity analysis. For topic detection, we apply one-versus-the-rest supervised topic classification, finding that good performance is achieved in predicting the policy topics, and that textual features derived from the debate titles associated with the motions are particularly indicative of motion topic. We then examine whether the output could also be used to determine the positions taken by proposers towards the different policies by investigating how well humans agree in interpreting the opinion polarities of the motions. Finding very high levels of agreement, we conclude that the policies used can be reliable labels for use in these tasks, and that successful topic detection can therefore provide opinion analysis of the motions ‘for free’.


Crowdsourcing-based Annotation of Emotions in Filipino and English Tweets
Fermin Roberto Lapitan | Riza Theresa Batista-Navarro | Eliezer Albacea
Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016)

The automatic analysis of emotions conveyed in social media content, e.g., tweets, has many beneficial applications. In the Philippines, one of the most disaster-prone countries in the world, such methods could potentially enable first responders to make timely decisions despite the risk of data deluge. However, recognising emotions expressed in Philippine-generated tweets, which are mostly written in Filipino, English or a mix of both, is a non-trivial task. In order to facilitate the development of natural language processing (NLP) methods that will automate such type of analysis, we have built a corpus of tweets whose predominant emotions have been manually annotated by means of crowdsourcing. Defining measures ensuring that only high-quality annotations were retained, we have produced a gold standard corpus of 1,146 emotion-labelled Filipino and English tweets. We validate the value of this manually produced resource by demonstrating that an automatic emotion-prediction method based on the use of a publicly available word-emotion association lexicon was unable to reproduce the labels assigned via crowdsourcing. While we are planning to make a few extensions to the corpus in the near future, its current version has been made publicly available in order to foster the development of emotion analysis methods based on advanced Filipino and English NLP.

Learning to recognise named entities in tweets by exploiting weakly labelled data
Kurt Junshean Espinosa | Riza Theresa Batista-Navarro | Sophia Ananiadou
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)

Named entity recognition (NER) in social media (e.g., Twitter) is a challenging task due to the noisy nature of text. As part of our participation in the W-NUT 2016 Named Entity Recognition Shared Task, we proposed an unsupervised learning approach using deep neural networks and leverage a knowledge base (i.e., DBpedia) to bootstrap sparse entity types with weakly labelled data. To further boost the performance, we employed a more sophisticated tagging scheme and applied dropout as a regularisation technique in order to reduce overfitting. Even without hand-crafting linguistic features nor leveraging any of the W-NUT-provided gazetteers, we obtained robust performance with our approach, which ranked third amongst all shared task participants according to the official evaluation on a gold standard named entity-annotated corpus of 3,856 tweets.

pdf bib
Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016)
Sophia Ananiadou | Riza Batista-Navarro | Kevin Bretonnel Cohen | Dina Demner-Fushman | Paul Thompson
Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016)


Interoperability and Customisation of Annotation Schemata in Argo
Rafal Rak | Jacob Carter | Andrew Rowley | Riza Theresa Batista-Navarro | Sophia Ananiadou
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The process of annotating text corpora involves establishing annotation schemata which define the scope and depth of an annotation task at hand. We demonstrate this activity in Argo, a Web-based workbench for the analysis of textual resources, which facilitates both automatic and manual annotation. Annotation tasks in the workbench are defined by building workflows consisting of a selection of available elementary analytics developed in compliance with the Unstructured Information Management Architecture specification. The architecture accommodates complex annotation types that may define primitive as well as referential attributes. Argo aids the development of custom annotation schemata and supports their interoperability by featuring a schema editor and specialised analytics for schemata alignment. The schema editor is a self-contained graphical user interface for defining annotation types. Multiple heterogeneous schemata can be aligned by including one of two type mapping analytics currently offered in Argo. One is based on a simple mapping syntax and, although limited in functionality, covers most common use cases. The other utilises a well established graph query language, SPARQL, and is superior to other state-of-the-art solutions in terms of expressiveness. We argue that the customisation of annotation schemata does not need to compromise their interoperability.


Towards a Better Understanding of Discourse: Integrating Multiple Discourse Annotation Perspectives Using UIMA
Claudiu Mihăilă | Georgios Kontonatsios | Riza Theresa Batista-Navarro | Paul Thompson | Ioannis Korkontzelos | Sophia Ananiadou
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

Extending an interoperable platform to facilitate the creation of multilingual and multimodal NLP applications
Georgios Kontonatsios | Paul Thompson | Riza Theresa Batista-Navarro | Claudiu Mihăilă | Ioannis Korkontzelos | Sophia Ananiadou
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations


What’s in a Name? Entity Type Variation across Two Biomedical Subdomains
Claudiu Mihăilă | Riza Theresa Batista-Navarro
Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics


Building a Coreference-Annotated Corpus from the Domain of Biochemistry
Riza Theresa Batista-Navarro | Sophia Ananiadou
Proceedings of BioNLP 2011 Workshop