Proceedings of the 20th Annual Workshop of the Australasian Language Technology Association

Climate change is an existential threat to humanity, the proliferation of unsubstantiated claims relating to climate science is manipulating public perception, motivating the need for fact-checking in climate science. In this work, we draw on recent work that uses retrieval-augmented generation for veracity prediction and explanation generation, in framing explanation generation as a query-focused multi-document summarization task. We adapt PRIMERA to the climate science domain by adding additional global attention on claims. Through automatic evaluation and qualitative analysis, we demonstrate that our method is effective at generating explanations.

pdf abs
Zhangzhou Implosives and Their Variations
Yishan Huang | Gwendolyn Hyslop

Zhangzhou Southern Min employs the airstream mechanism of glottalic ingressive as a contrastive feature in its onset system. However, their realisations are highly diverse with eleven phonetic variants that can be derived from three implosive phonemes (/ɓ, ɗ, ɠ/). The allophonic variations are regressively motivated by three driving factors comprising the nasal [Ṽ], labial-velar [u, w], and palatal [i, j] characteristics of subsequent segments. Several processes that include labialisation, nasalisation, lenition, laminalisation, dentalisation and palatalisation have been found to trigger alternation on the airstream mechanism, manner of articulation, and place of articulation of related sounds, resulting in diverse phonetic outputs of the three implosives phonemes that can be captured using phonological rules.

pdf abs
Evaluating the Examiner: The Perils of Pearson Correlation for Validating Text Similarity Metrics
Gisela Vallejo | Timothy Baldwin | Lea Frermann

In recent years, researchers have developed question-answering based approaches to automatically evaluate system summaries, reporting improved validity compared to word overlap-based metrics like ROUGE, in terms of correlation with human ratings of criteria including fluency and hallucination. In this paper, we take a closer look at one particular metric, QuestEval, and ask whether: (1) it can serve as a more general metric for long document similarity assessment; and (2) a single correlation score between metric scores and human ratings, as the currently standard approach, is sufficient for metric validation. We find that correlation scores can be misleading, and that score distributions and outliers should be taken into account. With these caveats in mind, QuestEval can be a promising candidate for long document similarity assessment.

pdf abs
Can Language Models Help in System Security? Investigating Log Anomaly Detection using BERT
Crispin Almodovar | Fariza Sabrina | Sarvnaz Karimi | Salahuddin Azad

The log files generated by networked computer systems contain valuable information that can be used to monitor system security and stability. Recently, techniques based on Deep Learning and Natural Language Processing have been proven effective in detecting anomalous activities from system logs. The current approaches, however, have limited practical application because they rely on log templates which cannot handle variability in log content, or they require supervised training to be effective. In this paper, a novel log anomaly detection approach named LogFiT is proposed. The LogFiT model inherits the linguistic “knowledge” encoded within a pretrained BERT-based language model and fine-tunes it towards learning the linguistic structure of system logs. The LogFiT model is trained in a self-supervised manner using normal log data only. Using masked token prediction and centroid distance minimisation as training objectives, the LogFiT model learns to recognise the linguistic patterns associated with the normal log data. During inference, a discriminator function uses the LogFiT model’s top-k token prediction accuracy and computed centroid distance to determine if the input is normal or anomaly. Experiments show that LogFiT’s F1 score and specificity exceeds that of baseline models on the HDFS dataset and comparable on the BGL dataset.

pdf abs
A Semantics of Spatial Expressions for interacting with unmanned aerial vehicles
Lucas Domingos | Paulo Santos

This paper describes an investigation of establishing communication between a quadro- tor and a human by means of qualitative spatial relations using speech recognition. It is based on a system capable to receive, interpret, process, act, transmit and execute the commands given. This system is composed of a quadrotor equipped with a GPS, IMU sensors and radio communication, and a computer acting as a ground station, that is capable of understanding and interpreting the received commands and correctly provide answers according to an underlying qualitative reasoning formalism. Tests were performed, whose results show that the error rate was less than five percent for vertical and radial dimensions, otherwise, in horizontal dimension, we had an error rate of almost ten percent.

pdf
Enhancing the DeBERTa Transformers Model for Classifying Sentences from Biomedical Abstracts
Abdul Aziz | Md. Akram Hossain | Abu Nowshed Chy

We introduce Textstar, a graph-based summarization and keyphrase extraction system that builds a document graph using only lemmatization and POS tagging. The document graph aggregates connections between lemma and sentence identifier nodes. Consecutive lemmas in each sentence, as well as consecutive sentences themselves, are connected in rings to form a ring of rings representing the document. We iteratively apply a centrality algorithm of our choice to the document graph and trim the lowest ranked nodes at each step. After the desired number of remaining sentences and lemmas is reached, we extract the sentences as the summary, and the remaining lemmas are aggregated into keyphrases using their context. Our algorithm is efficient enough to one-shot process large document graphs without any training, and empirical evaluation on several benchmarks indicates that our performance is higher than most other graph based algorithms.

pdf abs
Contrastive Visual and Language Learning for Visual Relationship Detection
Thanh Tran | Maelic Neau | Paulo Santos | David Powers

Visual Relationship Detection aims to understand real-world objects’ interactions by grounding visual concepts to compositional visual relation triples, written in the form of (subject, predicate, object). Previous works have explored the use of contrastive learning to implicitly predict the predicates from the relevant image regions. However, these models often directly leverage in-distribution spatial and language co-occurrences biases during training, preventing the models from generalizing to out-of-distribution compositions. In this work, we examine whether contrastive vision and language models pre-trained on large-scale external image and text dataset can assist the detection of compositional visual relationships. To this end, we propose a semi-supervised contrastive fine-tuning approach for the visual relationship detection task. The results show that fine-tuned models that were pre-trained on larger datasets do not yield better performance when performing visual relationship detection, and larger models can yield lower performance when compared with their smaller counterparts.

pdf abs
Overview of the 2022 ALTA Shared task: PIBOSO sentence classification, 10 years later
Diego Mollá

The 2022 ALTA shared task has been running annually since 2010. This year, the shared task is a re-visit of the 2012 ALTA shared task. The purpose of this task is to classify sentences of medical publications using the PIBOSO taxonomy. This is a multi-label classification task which can help medical researchers and practitioners conduct Evidence Based Medicine (EBM). In this paper we present the task, the evaluation criteria, and the results of the systems participating in the shared task.

pdf abs
Estimating the Strength of Authorship Evidence with a Deep-Learning-Based Approach
Shunichi Ishihara | Satoru Tsuge | Mitsuyuki Inaba | Wataru Zaitsu

This study is the first likelihood ratio (LR)-based forensic text comparison study in which each text is mapped onto an embedding vector using RoBERTa as the pre-trained model. The scores obtained with Cosine distance and probabilistic linear discriminant analysis (PLDA) were calibrated to LRs with logistic regression; the quality of the LRs was assessed by log LR cost (Cllr). Although the documents in the experiments were very short (maximum 100 words), the systems reached the Cllr values of 0.55595 and 0.71591 for the Cosine and PLDA systems, respectively. The effectiveness of deep-learning-based text representation is discussed by comparing the results of the current study to those of the previous studies of systems based on conventional feature engineering tested with longer documents.

pdf
Automatic Classification of Evidence Based Medicine Using Transformers
Necva Bolucu | Pinar Uskaner Hepsag

pdf
Context-Aware Sentence Classification in Evidence-Based Medicine
Biaoyan Fang | Fajri Koto