Kevin Heffernan


2022

pdf
Problem-solving Recognition in Scientific Text
Kevin Heffernan | Simone Teufel
Proceedings of the Thirteenth Language Resources and Evaluation Conference

As far back as Aristotle, problems and solutions have been recognised as a core pattern of thought, and in particular of the scientific method. In this work, we present the novel task of problem-solving recognition in scientific text. Previous work on problem-solving either is not computational, is not adapted to scientific text, or has been narrow in scope. This work provides a new annotation scheme of problem-solving tailored to the scientific domain. We validate the scheme with an annotation study, and model the task using state-of-the-art baselines such as a Neural Relational Topic Model. The agreement study indicates that our annotation is reliable, and results from modelling show that problem-solving expressions in text can be recognised to a high degree of accuracy.

2020

pdf
Dialect Clustering with Character-Based Metrics: in Search of the Boundary of Language and Dialect
Yo Sato | Kevin Heffernan
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present in this work a universal, character-based method for representing sentences so that one can thereby calculate the distance between any two sentence pair. With a small alphabet, it can function as a proxy of phonemes, and as one of its main uses, we carry out dialect clustering: cluster a dialect/sub-language mixed corpus into sub-groups and see if they coincide with the conventional boundaries of dialects and sub-languages. By using data with multiple Japanese dialects and multiple Slavic languages, we report how well each group clusters, in a manner to partially respond to the question of what separates languages from dialects.

pdf
Homonym normalisation by word sense clustering: a case in Japanese
Yo Sato | Kevin Heffernan
Proceedings of the 28th International Conference on Computational Linguistics

This work presents a method of word sense clustering that differentiates homonyms and merge homophones, taking Japanese as an example, where orthographical variation causes problem for language processing. It uses contextualised embeddings (BERT) to cluster tokens into distinct sense groups, and we use these groups to normalise synonymous instances to a single representative form. We see the benefit of this normalisation in language model, as well as in transliteration.

2018

pdf
Creating dialect sub-corpora by clustering: a case in Japanese for an adaptive method
Yo Sato | Kevin Heffernan
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)