Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: Student Research Workshop

Yan Hanqi, Yang Zonghan, Sebastian Ruder, Wan Xiaojun (Editors)


Anthology ID:
2022.aacl-srw
Month:
November
Year:
2022
Address:
Online
Venues:
AACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
URL:
https://preview.aclanthology.org/build-pipeline-with-new-library/2022.aacl-srw/
DOI:
10.18653/v1/2022.aacl-srw
Bib Export formats:
BibTeX
PDF:
https://preview.aclanthology.org/build-pipeline-with-new-library/2022.aacl-srw.pdf

pdf bib
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: Student Research Workshop
Yan Hanqi | Yang Zonghan | Sebastian Ruder | Wan Xiaojun

pdf bib
Emotional Intensity Estimation based on Writer’s Personality
Haruya Suzuki | Sora Tarumoto | Tomoyuki Kajiwara | Takashi Ninomiya | Yuta Nakashima | Hajime Nagahara

We propose a method for personalized emotional intensity estimation based on a writer’s personality test for Japanese SNS posts. Existing emotion analysis models are difficult to accurately estimate the writer’s subjective emotions behind the text. We personalize the emotion analysis using not only the text but also the writer’s personality information. Experimental results show that personality information improves the performance of emotional intensity estimation. Furthermore, a hybrid model combining the existing personalized method with ours achieved state-of-the-art performance.

pdf bib
Bipartite-play Dialogue Collection for Practical Automatic Evaluation of Dialogue Systems
Shiki Sato | Yosuke Kishinami | Hiroaki Sugiyama | Reina Akama | Ryoko Tokuhisa | Jun Suzuki

Automation of dialogue system evaluation is a driving force for the efficient development of dialogue systems. This paper introduces the bipartite-play method, a dialogue collection method for automating dialogue system evaluation. It addresses the limitations of existing dialogue collection methods: (i) inability to compare with systems that are not publicly available, and (ii) vulnerability to cheating by intentionally selecting systems to be compared. Experimental results show that the automatic evaluation using the bipartite-play method mitigates these two drawbacks and correlates as strongly with human subjectivity as existing methods.

pdf
Toward Building a Language Model for Understanding Temporal Commonsense
Mayuko Kimura | Lis Kanashiro Pereira | Ichiro Kobayashi

The ability to capture temporal commonsense relationships for time-related events expressed in text is a very important task in natural language understanding. On the other hand, pre-trained language models such as BERT, which have recently achieved great success in a wide range of natural language processing tasks, are still considered to have poor performance in temporal reasoning. In this paper, we focus on the development of language models for temporal commonsense inference over several pre-trained language models. Our model relies on multi-step fine-tuning using multiple corpora, and masked language modeling to predict masked temporal indicators that are crucial for temporal commonsense reasoning. We also experimented with multi-task learning and build a language model that can improve performance on multiple time-related tasks. In our experiments, multi-step fine-tuning using the general commonsense reading task as auxiliary task produced the best results. This result showed a significant improvement in accuracy over standard fine-tuning in the temporal commonsense inference task.

pdf
Optimal Summaries for Enabling a Smooth Handover in Chat-Oriented Dialogue
Sanae Yamashita | Ryuichiro Higashinaka

In dialogue systems, one option for creating a better dialogue experience for the user is to have a human operator take over the dialogue when the system runs into trouble communicating with the user. In this type of handover situation (we call it intervention), it is useful for the operator to have access to the dialogue summary. However, it is not clear exactly what type of summary would be the most useful for a smooth handover. In this study, we investigated the optimal type of summary through experiments in which interlocutors were presented with various summary types during interventions in order to examine their effects. Our findings showed that the best summaries were an abstractive summary plus one utterance immediately before the handover and an extractive summary consisting of five utterances immediately before the handover. From the viewpoint of computational cost, we recommend that extractive summaries consisting of the last five utterances be used.

pdf
MUTE: A Multimodal Dataset for Detecting Hateful Memes
Eftekhar Hossain | Omar Sharif | Mohammed Moshiul Hoque

The exponential surge of social media has enabled information propagation at an unprecedented rate. However, it also led to the generation of a vast amount of malign content, such as hateful memes. To eradicate the detrimental impact of this content, over the last few years hateful memes detection problem has grabbed the attention of researchers. However, most past studies were conducted primarily for English memes, while memes on resource constraint languages (i.e., Bengali) are under-studied. Moreover, current research considers memes with a caption written in monolingual (either English or Bengali) form. However, memes might have code-mixed captions (English+Bangla), and the existing models can not provide accurate inference in such cases. Therefore, to facilitate research in this arena, this paper introduces a multimodal hate speech dataset (named MUTE) consisting of 4158 memes having Bengali and code-mixed captions. A detailed annotation guideline is provided to aid the dataset creation in other resource constraint languages. Additionally, extensive experiments have been carried out on MUTE, considering the only visual, only textual, and both modalities. The result demonstrates that joint evaluation of visual and textual features significantly improves (≈ 3%) the hateful memes classification compared to the unimodal evaluation.

pdf
A Simple and Fast Strategy for Handling Rare Words in Neural Machine Translation
Nguyen-Hoang Minh-Cong | Vinh Thi Ngo | Van Vinh Nguyen

Neural Machine Translation (NMT) has currently obtained state-of-the-art in machine translation systems. However, dealing with rare words is still a big challenge in translation systems. The rare words are often translated using a manual dictionary or copied from the source to the target with original words. In this paper, we propose a simple and fast strategy for integrating constraints during the training and decoding process to improve the translation of rare words. The effectiveness of our proposal is demonstrated in both high and low-resource translation tasks, including the language pairs: English → Vietnamese, Chinese → Vietnamese, Khmer → Vietnamese, and Lao → Vietnamese. We show the improvements of up to +1.8 BLEU scores over the baseline systems.

pdf
C3PO: A Lightweight Copying Mechanism for Translating Pseudocode to Code
Vishruth Veerendranath | Vibha Masti | Prajwal Anagani | Mamatha Hr

Writing computer programs is a skill that remains inaccessible to most due to the barrier of programming language (PL) syntax. While large language models (LLMs) have been proposed to translate natural language pseudocode to PL code, they are costly in terms of data and compute. We propose a lightweight alternative to LLMs that exploits the property of code wherein most tokens can be simply copied from the pseudocode. We divide the problem into three phases: Copy, Generate, and Combine. In the Copy Phase, a binary classifier is employed to determine and mask the pseudocode tokens that can be directly copied into the code. In the Generate Phase, a Sequence-to-Sequence model is used to generate the masked PL code equivalent. In the Combine Phase, the generated sequence is combined with the tokens that the Copy Phase had masked. We show that our C3PO models achieve similar performance to non-C3PO models while reducing the computational cost of training as well as the vocabulary sizes.

pdf
Outlier-Aware Training for Improving Group Accuracy Disparities
Li-Kuang Chen | Canasai Kruengkrai | Junichi Yamagishi

Methods addressing spurious correlations such as Just Train Twice (JTT, Liu et al. 2021) involve reweighting a subset of the training set to maximize the worst-group accuracy. However, the reweighted set of examples may potentially contain unlearnable examples that hamper the model’s learning. We propose mitigating this by detecting outliers to the training set and removing them before reweighting. Our experiments show that our method achieves competitive or better accuracy compared with JTT and can detect and remove annotation errors in the subset being reweighted in JTT.

pdf
An Empirical Study on Topic Preservation in Multi-Document Summarization
Mong Yuan Sim | Wei Emma Zhang | Congbo Ma

Multi-document summarization (MDS) is a process of generating an informative and concise summary from multiple topic-related documents. Many studies have analyzed the quality of MDS dataset or models, however no work has been done from the perspective of topic preservation. In this work, we fill the gap by performing an empirical analysis on two MDS datasets and study topic preservation on generated summaries from 8 MDS models. Our key findings include i) Multi-News dataset has better gold summaries compared to Multi-XScience in terms of its topic distribution consistency and ii) Extractive approaches perform better than abstractive approaches in preserving topic information from source documents. We hope our findings could help develop a summarization model that can generate topic-focused summary and also give inspiration to researchers in creating dataset for such challenging task.

pdf
Detecting Urgency in Multilingual Medical SMS in Kenya
Narshion Ngao | Zeyu Wang | Lawrence Nderu | Tobias Mwalili | Tal August | Keshet Ronen

Access to mobile phones in many low- and middle-income countries has increased exponentially over the last 20 years, providing an opportunity to connect patients with healthcare interventions through mobile phones (known as mobile health). A barrier to large-scale implementation of interactive mobile health interventions is the human effort needed to manage participant messages. In this study, we explore the use of natural language processing to improve healthcare workers’ management of messages from pregnant and postpartum women in Kenya. Using multilingual, low-resource language text messages from the Mobile solutions for Women and Children’s health (Mobile WACh NEO) study, we developed models to assess urgency of incoming messages. We evaluated models using a novel approach that focuses on clinical usefulness in either triaging or prioritizing messages. Our best-performing models did not reach the threshold for clinical usefulness we set, but have the potential to improve nurse workflow and responsiveness to urgent messages.

pdf
Language over Labels: Contrastive Language Supervision Exceeds Purely Label-Supervised Classification Performance on Chest X-Rays
Anton Wiehe | Florian Schneider | Sebastian Blank | Xintong Wang | Hans-Peter Zorn | Christian Biemann

The multi-modal foundation model CLIP computes representations from texts and images that achieved unprecedented performance on tasks such as zero-shot image classification. However, CLIP was pretrained on public internet data. Thus it lacks highly domain-specific knowledge. We investigate the adaptation of CLIP-based models to the chest radiography domain using the MIMIC-CXR dataset. We show that the features of the pretrained CLIP models do not transfer to this domain. We adapt CLIP to the chest radiography domain using contrastive language supervision and show that this approach yields a model that outperforms supervised learning on labels on the MIMIC-CXR dataset while also generalizing to the CheXpert and RSNA Pneumonia datasets. Furthermore, we do a detailed ablation study of the batch and dataset size. Finally, we show that language supervision allows for better explainability by using the multi-modal model to generate images from texts such that experts can inspect what the model has learned.

pdf
Dynamic Topic Modeling by Clustering Embeddings from Pretrained Language Models: A Research Proposal
Anton Eklund | Mona Forsman | Frank Drewes

A new trend in topic modeling research is to do Neural Topic Modeling by Clustering document Embeddings (NTM-CE) created with a pretrained language model. Studies have evaluated static NTM-CE models and found them performing comparably to, or even better than other topic models. An important extension of static topic modeling is making the models dynamic, allowing the study of topic evolution over time, as well as detecting emerging and disappearing topics. In this research proposal, we present two research questions to understand dynamic topic modeling with NTM-CE theoretically and practically. To answer these, we propose four phases with the aim of establishing evaluation methods for dynamic topic modeling, finding NTM-CE-specific properties, and creating a framework for dynamic NTM-CE. For evaluation, we propose to use both quantitative measurements of coherence and human evaluation supported by our recently developed tool.

pdf
Concreteness vs. Abstractness: A Selectional Preference Perspective
Tarun Tater | Diego Frassinelli | Sabine Schulte im Walde

Concrete words refer to concepts that are strongly experienced through human senses (banana, chair, salt, etc.), whereas abstract concepts are less perceptually salient (idea, glory, justice, etc.). A clear definition of abstractness is crucial for the understanding of human cognitive processes and for the development of natural language applications such as figurative language detection. In this study, we investigate selectional preferences as a criterion to distinguish between concrete and abstract concepts and words: we hypothesise that abstract and concrete verbs and nouns differ regarding the semantic classes of their arguments. Our study uses a collection of 5,438 nouns and 1,275 verbs to exploit selectional preferences as a salient characteristic in classifying English abstract vs. concrete words, and in predicting their concreteness scores. We achieve an f1-score of 0.84 for nouns and 0.71 for verbs in classification, and Spearman’s ρ correlation of 0.86 for nouns and 0.59 for verbs.