Julia Otmakhova

Also published as: Yulia Otmakhova


LED down the rabbit hole: exploring the potential of global attention for biomedical multi-document summarisation
Yulia Otmakhova | Thinh Hung Truong | Timothy Baldwin | Trevor Cohn | Karin Verspoor | Jey Han Lau
Proceedings of the Third Workshop on Scholarly Document Processing

In this paper we report the experiments performed for the submission to the Multidocument summarisation for Literature Review (MSLR) Shared Task. In particular, we adopt Primera model to the biomedical domain by placing global attention on important biomedical entities in several ways. We analyse the outputs of 23 resulting models and report some patterns related to the presence of additional global attention, number of training steps and the inputs configuration.

Cross-linguistic Comparison of Linguistic Feature Encoding in BERT Models for Typologically Different Languages
Yulia Otmakhova | Karin Verspoor | Jey Han Lau
Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP

Though recently there have been an increased interest in how pre-trained language models encode different linguistic features, there is still a lack of systematic comparison between languages with different morphology and syntax. In this paper, using BERT as an example of a pre-trained model, we compare how three typologically different languages (English, Korean, and Russian) encode morphology and syntax features across different layers. In particular, we contrast languages which differ in a particular aspect, such as flexibility of word order, head directionality, morphological type, presence of grammatical gender, and morphological richness, across four different tasks.

The patient is more dead than alive: exploring the current state of the multi-document summarisation of the biomedical literature
Yulia Otmakhova | Karin Verspoor | Timothy Baldwin | Jey Han Lau
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Although multi-document summarisation (MDS) of the biomedical literature is a highly valuable task that has recently attracted substantial interest, evaluation of the quality of biomedical summaries lacks consistency and transparency. In this paper, we examine the summaries generated by two current models in order to understand the deficiencies of existing evaluation approaches in the context of the challenges that arise in the MDS task. Based on this analysis, we propose a new approach to human evaluation and identify several challenges that must be overcome to develop effective biomedical MDS systems.

M3: Multi-level dataset for Multi-document summarisation of Medical studies
Yulia Otmakhova | Karin Verspoor | Timothy Baldwin | Antonio Jimeno Yepes | Jey Han Lau
Findings of the Association for Computational Linguistics: EMNLP 2022

We present M3 (Multi-level dataset for Multi-document summarisation of Medical studies), a benchmark dataset for evaluating the quality of summarisation systems in the biomedical domain. The dataset contains sets of multiple input documents and target summaries of three levels of complexity: documents, sentences, and propositions. The dataset also includes several levels of annotation, including biomedical entities, direction, and strength of relations between them, and the discourse relationships between the input documents (“contradiction” or “agreement”). We showcase usage scenarios of the dataset by testing 10 generic and domain-specific summarisation models in a zero-shot setting, and introduce a probing task based on counterfactuals to test if models are aware of the direction and strength of the conclusions generated from input studies.

Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal Negation
Thinh Hung Truong | Yulia Otmakhova | Timothy Baldwin | Trevor Cohn | Jey Han Lau | Karin Verspoor
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Negation is poorly captured by current language models, although the extent of this problem is not widely understood. We introduce a natural language inference (NLI) test suite to enable probing the capabilities of NLP methods, with the aim of understanding sub-clausal negation. The test suite contains premise–hypothesis pairs where the premise contains sub-clausal negation and the hypothesis is constructed by making minimal modifications to the premise in order to reflect different possible interpretations. Aside from adopting standard NLI labels, our test suite is systematically constructed under a rigorous linguistic framework. It includes annotation of negation types and constructions grounded in linguistic theory, as well as the operations used to construct hypotheses. This facilitates fine-grained analysis of model performance. We conduct experiments using pre-trained language models to demonstrate that our test suite is more challenging than existing benchmarks focused on negation, and show how our annotation supports a deeper understanding of the current NLI capabilities in terms of negation and quantification.


Improved Topic Representations of Medical Documents to Assist COVID-19 Literature Exploration
Yulia Otmakhova | Karin Verspoor | Timothy Baldwin | Simon Šuster
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

Efficient discovery and exploration of biomedical literature has grown in importance in the context of the COVID-19 pandemic, and topic-based methods such as latent Dirichlet allocation (LDA) are a useful tool for this purpose. In this study we compare traditional topic models based on word tokens with topic models based on medical concepts, and propose several ways to improve topic coherence and specificity.


Do We Really Need Lexical Information? Towards a Top-down Approach to Sentiment Analysis of Product Reviews
Yulia Otmakhova | Hyopil Shin
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies


Applying Graph-based Keyword Extraction to Document Retrieval
Youngsam Kim | Munhyong Kim | Andrew Cattle | Julia Otmakhova | Suzi Park | Hyopil Shin
Proceedings of the Sixth International Joint Conference on Natural Language Processing