Australasian Language Technology Association Workshop (2019)

Volumes

Proceedings of the 17th Annual Workshop of the Australasian Language Technology Association 28 papers

pdf (full)
bib (full) Proceedings of the 17th Annual Workshop of the Australasian Language Technology Association

pdf bib
Proceedings of the 17th Annual Workshop of the Australasian Language Technology Association
Meladel Mistica | Massimo Piccardi | Andrew MacKinlay

pdf bib abs
Towards A Robust Morphological Analyzer for Kunwinjku
William Lane | Steven Bird

Kunwinjku is an indigenous Australian language spoken in northern Australia which exhibits agglutinative and polysynthetic properties. Members of the community have expressed interest in co-developing language applications that promote their values and priorities. Modeling the morphology of the Kunwinjku language is an important step towards accomplishing the community’s goals. Finite State Transducers have long been the go-to method for modeling morphologically rich languages, and in this paper we discuss some of the distinct modeling challenges present in the morphosyntax of verbs in Kunwinjku. We show that a fairly straightforward implementation using standard features of the foma toolkit can account for much of the verb structure. Continuing challenges include robustness in the face of variation and unseen vocabulary, as well as how to handle complex reduplicative processes. Our future work will build off the baseline and challenges presented here.

pdf bib abs
From Shakespeare to Li-Bai: Adapting a Sonnet Model to Chinese Poetry
Zhuohan Xie | Jey Han Lau | Trevor Cohn

In this paper, we adapt Deep-speare, a joint neural network model for English sonnets, to Chinese poetry. We illustrate characteristics of Chinese quatrain and explain our architecture as well as training and generation procedure, which differs from Shakespeare sonnets in several aspects. We analyse the generated poetry and find that model works well for Chinese poetry, as it can: (1) generate coherent 4-line quatrains of different topics; and (2) capture rhyme automatically (to a certain extent).

pdf abs
Readability of Twitter Tweets for Second Language Learners
Patrick Jacob | Alexandra Uitdenbogerd

Optimal language acquisition via reading requires the learners to read slightly above their current language skill level. Identifying material at the right level is the essential role of automatic readability measurement. Short message platforms such as Twitter offer the opportunity for language practice while reading about current topics and engaging in conversation in small doses, and can be filtered according to linguistic criteria to suit the learner. In this research, we explore how readable tweets are for English language learners and which factors contribute to their readability. With participants from six language groups, we collected 14,659 data points, each representing a tweet from a pool of 4100 tweets, and a judgement of perceived readability. Traditional readability measures and features failed on the data-set, but demographic data showed that judgements were largely genuine and reflected reported language skill, which is consistent with other recent studies. We report on the properties of the data set and implications for future research.

pdf
A neural joint model for Vietnamese word segmentation, POS tagging and dependency parsing
Dat Quoc Nguyen

pdf
Modelling Tibetan Verbal Morphology
Qianji Di | Ekaterina Vylomova | Tim Baldwin

pdf
A multi-constraint structured hinge loss for named-entity recognition
Hanieh Poostchi | Massimo Piccardi

pdf
Feature-guided Neural Model Training for Supervised Document Representation Learning
Aili Shen | Bahar Salehi | Jianzhong Qi | Timothy Baldwin

pdf abs
Red-faced ROUGE: Examining the Suitability of ROUGE for Opinion Summary Evaluation
Wenyi Tay | Aditya Joshi | Xiuzhen Zhang | Sarvnaz Karimi | Stephen Wan

One of the most common metrics to automatically evaluate opinion summaries is ROUGE, a metric developed for text summarisation. ROUGE counts the overlap of word or word units between a candidate summary against reference summaries. This formulation treats all words in the reference summary equally. In opinion summaries, however, not all words in the reference are equally important. Opinion summarisation requires to correctly pair two types of semantic information: (1) aspect or opinion target; and (2) polarity of candidate and reference summaries. We investigate the suitability of ROUGE for evaluating opin-ion summaries of online reviews. Using three simulation-based experiments, we evaluate the behaviour of ROUGE for opinion summarisation on the ability to match aspect and polarity. We show that ROUGE cannot distinguish opinion summaries of similar or opposite polarities for the same aspect. Moreover,ROUGE scores have significant variance under different configuration settings. As a result, we present three recommendations for future work that uses ROUGE to evaluate opinion summarisation.

pdf
Modeling Political Framing Across Policy Issues and Contexts
Shima Khanehzar | Andrew Turpin | Gosia Mikolajczak

pdf abs
Improved Document Modelling with a Neural Discourse Parser
Fajri Koto | Jey Han Lau | Timothy Baldwin

Despite the success of attention-based neural models for natural language generation and classification tasks, they are unable to capture the discourse structure of larger documents. We hypothesize that explicit discourse representations have utility for NLP tasks over longer documents or document sequences, which sequence-to-sequence models are unable to capture. For abstractive summarization, for instance, conventional neural models simply match source documents and the summary in a latent space without explicit representation of text structure or relations. In this paper, we propose to use neural discourse representations obtained from a rhetorical structure theory (RST) parser to enhance document representations. Specifically, document representations are generated for discourse spans, known as the elementary discourse units (EDUs). We empirically investigate the benefit of the proposed approach on two different tasks: abstractive summarization and popularity prediction of online petitions. We find that the proposed approach leads to substantial improvements in all cases.

pdf abs
Does an LSTM forget more than a CNN? An empirical study of catastrophic forgetting in NLP
Gaurav Arora | Afshin Rahimi | Timothy Baldwin

Catastrophic forgetting — whereby a model trained on one task is fine-tuned on a second, and in doing so, suffers a “catastrophic” drop in performance over the first task — is a hurdle in the development of better transfer learning techniques. Despite impressive progress in reducing catastrophic forgetting, we have limited understanding of how different architectures and hyper-parameters affect forgetting in a network. With this study, we aim to understand factors which cause forgetting during sequential training. Our primary finding is that CNNs forget less than LSTMs. We show that max-pooling is the underlying operation which helps CNNs alleviate forgetting compared to LSTMs. We also found that curriculum learning, placing a hard task towards the end of task sequence, reduces forgetting. We analysed the effect of fine-tuning contextual embeddings on catastrophic forgetting and found that using embeddings as feature extractor is preferable to fine-tuning in continual learning setup.

pdf
Domain Adaptation for Low-Resource Neural Semantic Parsing
Alvin Kennardi | Gabriela Ferraro | Qing Wang

pdf
A Pointer Network Architecture for Context-Dependent Semantic Parsing
Xuanli He | Quan Tran | Gholamreza Haffari

Extracting chemical reactions from patents is a crucial task for chemists working on chemical exploration. In this paper we introduce the novel task of detecting the textual spans that describe or refer to chemical reactions within patents. We formulate this task as a paragraph-level sequence tagging problem, where the system is required to return a sequence of paragraphs which contain a description of a reaction. To address this new task, we construct an annotated dataset from an existing proprietary database of chemical reactions manually extracted from patents. We introduce several baseline methods for the task and evaluate them over our dataset. Through error analysis, we discuss what makes the task complex and challenging, and suggest possible directions for future research.

pdf abs
Identifying Patients with Pain in Emergency Departments using Conventional Machine Learning and Deep Learning
Thanh Vu | Anthony Nguyen | Nathan Brown | James Hughes

Pain is the main symptom that patients present with to the emergency department (ED). Pain management, however, is often poorly done aspect of emergency care and patients with painful conditions can endure long waits before their pain is assessed or treated. To improve pain management quality, identifying whether or not an ED patient presents with pain is an important task and allows for further investigation of the quality of care provided. In this paper, machine learning was utilised to handle the task of automatically detecting patients who present at EDs with pain from retrospective data. Experimental results on a manually annotated dataset show that our proposed machine learning models achieve high performances, in which the highest accuracy and macro-averaged F1 are 91.00% and 90.96%, respectively.

pdf
Emerald 110k: A Multidisciplinary Dataset for Abstract Sentence Classification
Connor Stead | Stephen Smith | Peter Busch | Savanid Vatanasakdakul

pdf abs
CNL-ER: A Controlled Natural Language for Specifying and Verbalising Entity Relationship Models
Bayzid Ashik Hossain | Gayathri Rajan | Rolf Schwitter

The first step towards designing an information system is conceptual modelling where domain experts and knowledge engineers identify the necessary information together to build an information system. Entity relationship modelling is one of the most popular conceptual modelling techniques that represents an information system in terms of entities, attributes and relationships. Entity relationship models are constructed graphically but are often difficult to understand by domain experts. To overcome this problem, we suggest to verbalise these models in a controlled natural language. In this paper, we present CNL-ER, a controlled natural language for specifying and verbalising entity relationship (ER) models that not only solves the verbalisation problem for these models but also provides the benefits of automatic verification and validation, and semantic round-tripping which makes the communication process transparent between the domain experts and the knowledge engineers.

pdf abs
Measuring English Readability for Vietnamese Speakers
Phuoc Nguyen | Alexandra Uitdenbogerd

Reading is important for any language learner, but the difficulty level of the text needs to match a reader’s level to enable efficient learning of new vocabulary. Many widely used traditional readability measures are not effective for those who speak English as a second or additional language. This study examines English readability for Vietnamese native speakers (VL1). A collection of text difficulty judgements of nearly 100 English text passages was obtained from 12 VL1 participants, using a 5-point Likert scale. Using the same basic features found in traditional English readability measures we found that SVMs and Dale-Chall features were slightly better than linear models using either Flesch or Dale-Chall. VL1 participants’ text judgements were strongly correlated with their past IELTS test scores. This study introduces a first approximation to readability of English text for VL1, with suggestions for further improvements.

pdf
FindHer: a Filter to Find Women Experts
Gabriela Ferraro | Zoe Piper | Rebecca Hinton

pdf abs
Does Multi-Task Learning Always Help?: An Evaluation on Health Informatics
Aditya Joshi | Sarvnaz Karimi | Ross Sparks | Cecile Paris | C Raina MacIntyre

Multi-Task Learning (MTL) has been an attractive approach to deal with limited labeled datasets or leverage related tasks, for a variety of NLP problems. We examine the benefit of MTL for three specific pairs of health informatics tasks that deal with: (a) overlapping symptoms for the same classification problem (personal health mention classification for influenza and for a set of symptoms); (b) overlapping medical concepts for related classification problems (vaccine usage and drug usage detection); and, (c) related classification problems (vaccination intent and vaccination relevance detection). We experiment with a simple neural architecture: a shared layer followed by task-specific dense layers. The novelty of this work is that it compares alternatives for shared layers for these pairs of tasks. While our observations agree with the promise of MTL as compared to single-task learning, for health informatics, we show that the benefit also comes with caveats in terms of the choice of shared layers and the relatedness between the participating tasks.

pdf
Difficulty-aware Distractor Generation for Gap-Fill Items
Chak Yan Yeung | John Lee | Benjamin Tsou

pdf
Investigating the Effect of Lexical Segmentation in Transformer-based Models on Medical Datasets
Vincent Nguyen | Sarvnaz Karimi | Zhenchang Xing

pdf
Neural Versus Non-Neural Text Simplification: A Case Study
Islam Nassar | Michelle Ananda-Rajah | Gholamreza Haffari

pdf abs
An Improved Coarse-to-Fine Method for Solving Generation Tasks
Wenyv Guan | Qianying Liu | Guangzhi Han | Bin Wang | Sujian Li

The coarse-to-fine (coarse2fine) methods have recently been widely used in the generation tasks. The methods first generate a rough sketch in the coarse stage and then use the sketch to get the final result in the fine stage. However, they usually lack the correction ability when getting a wrong sketch. To solve this problem, in this paper, we propose an improved coarse2fine model with a control mechanism, with which our method can control the influence of the sketch on the final results in the fine stage. Even if the sketch is wrong, our model still has the opportunity to get a correct result. We have experimented our model on the tasks of semantic parsing and math word problem solving. The results have shown the effectiveness of our proposed model.

pdf
A string-to-graph constructive alignment algorithm for discrete and probabilistic language modeling
Andrey Shcherbakov | Ekaterina Vylomova

pdf abs
Overview of the 2019 ALTA Shared Task: Sarcasm Target Identification
Diego Molla | Aditya Joshi

We present an overview of the 2019 ALTA shared task. This is the 10th of the series of shared tasks organised by ALTA since 2010. The task was to detect the target of sarcastic comments posted on social media. We intro- duce the task, describe the data and present the results of baselines and participants. This year’s shared task was particularly challenging and no participating systems improved the re- sults of our baseline.

pdf abs
Detecting Target of Sarcasm using Ensemble Methods
Pradeesh Parameswaran | Andrew Trotman | Veronica Liesaputra | David Eyers

We describe our methods in trying to detect the target of sarcasm as part of ALTA 2019 shared task. We use combination of ensemble of clas- sifiers and a rule-based system. Our team ob- tained a Dice-Sorensen Coefficient score of 0.37150, which placed 2nd in the public leader- board. Despite no team beating the baseline score for the private dataset, we present our findings and also some of the challenges and future improvements which can be used in or- der to tackle the problem.