Jonathan Tuke


Life still goes on: Analysing Australian WW1 Diaries through Distant Reading
Ashley Dennis-Henderson | Matthew Roughan | Lewis Mitchell | Jonathan Tuke
Proceedings of the The 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

An increasing amount of historic data is now available in digital (text) formats. This gives quantitative researchers an opportunity to use distant reading techniques, as opposed to traditional close reading, in order to analyse larger quantities of historic data. Distant reading allows researchers to view overall patterns within the data and reduce researcher bias. One such data set that has recently been transcribed is a collection of over 500 Australian World War I (WW1) diaries held by the State Library of New South Wales. Here we apply distant reading techniques to this corpus to understand what soldiers wrote about and how they felt over the course of the war. Extracting dates accurately is important as it allows us to perform our analysis over time, however, it is very challenging due to the variety of date formats and abbreviations diarists use. But with that data, topic modelling and sentiment analysis can then be applied to show trends, for instance, that despite the horrors of war, Australians in WW1 primarily wrote about their everyday routines and experiences. Our results detail some of the challenges likely to be encountered by quantitative researchers intending to analyse historical texts, and provide some approaches to these issues.


A framework for streamlined statistical prediction using topic models
Vanessa Glenny | Jonathan Tuke | Nigel Bean | Lewis Mitchell
Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

In the Humanities and Social Sciences, there is increasing interest in approaches to information extraction, prediction, intelligent linkage, and dimension reduction applicable to large text corpora. With approaches in these fields being grounded in traditional statistical techniques, the need arises for frameworks whereby advanced NLP techniques such as topic modelling may be incorporated within classical methodologies. This paper provides a classical, supervised, statistical learning framework for prediction from text, using topic models as a data reduction method and the topics themselves as predictors, alongside typical statistical tools for predictive modelling. We apply this framework in a Social Sciences context (applied animal behaviour) as well as a Humanities context (narrative analysis) as examples of this framework. The results show that topic regression models perform comparably to their much less efficient equivalents that use individual words as predictors.

Podlab at SemEval-2019 Task 3: The Importance of Being Shallow
Andrew Nguyen | Tobin South | Nigel Bean | Jonathan Tuke | Lewis Mitchell
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes our linear SVM system for emotion classification from conversational dialogue, entered in SemEval2019 Task 3. We used off-the-shelf tools coupled with feature engineering and parameter tuning to create a simple, interpretable, yet high-performing, classification model. Our system achieves a micro F1 score of 0.7357, which is 92% of the top score for the competition, demonstrating that “shallow” classification approaches can perform well when coupled with detailed fea- ture selection and statistical analysis.