Beatrice Alex

Also published as: Bea Alex

2021

pdf bib abs
The Online Pivot: Lessons Learned from Teaching a Text and Data Mining Course in Lockdown, Enhancing online Teaching with Pair Programming and Digital Badges
Beatrice Alex | Clare Llewellyn | Pawel Orzechowski | Maria Boutchkova
Proceedings of the Fifth Workshop on Teaching NLP

In this paper we provide an account of how we ported a text and data mining course online in summer 2020 as a result of the COVID-19 pandemic and how we improved it in a second pilot run. We describe the course, how we adapted it over the two pilot runs and what teaching techniques we used to improve students’ learning and community building online. We also provide information on the relentless feedback collected during the course which helped us to adapt our teaching from one session to the next and one pilot to the next. We discuss the lessons learned and promote the use of innovative teaching techniques applied to the digital such as digital badges and pair programming in break-out rooms for teaching Natural Language Processing courses to beginners and students with different backgrounds.

pdf bib abs
CoPHE: A Count-Preserving Hierarchical Evaluation Metric in Large-Scale Multi-Label Text Classification
Matúš Falis | Hang Dong | Alexandra Birch | Beatrice Alex
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Large-Scale Multi-Label Text Classification (LMTC) includes tasks with hierarchical label spaces, such as automatic assignment of ICD-9 codes to discharge summaries. Performance of models in prior art is evaluated with standard precision, recall, and F1 measures without regard for the rich hierarchical structure. In this work we argue for hierarchical evaluation of the predictions of neural LMTC models. With the example of the ICD-9 ontology we describe a structural issue in the representation of the structured label space in prior art, and propose an alternative representation based on the depth of the ontology. We propose a set of metrics for hierarchical evaluation using the depth-based representation. We compare the evaluation scores from the proposed metrics with previously used metrics on prior art LMTC models for ICD-9 coding in MIMIC-III. We also propose further avenues of research involving the proposed ontological representation.

2020

pdf bib abs
Enhanced Labelling in Active Learning for Coreference Resolution
Vebjørn Espeland | Beatrice Alex | Benjamin Bach
Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference

In this paper we describe our attempt to increase the amount of information that can be retrieved through active learning sessions compared to previous approaches. We optimise the annotator’s labelling process using active learning in the context of coreference resolution. Using simulated active learning experiments, we suggest three adjustments to ensure the labelling time is spent as efficiently as possible. All three adjustments provide more information to the machine learner than the baseline, though a large impact on the F1 score over time is not observed. Compared to previous models, we report a marginal F1 improvement on the final coreference models trained using for two out of the three approaches tested when applied to the English OntoNotes 2012 Coreference Resolution data. Our best-performing model achieves 58.01 F1, an increase of 0.93 F1 over the baseline model.

pdf bib abs
Not a cute stroke: Analysis of Rule- and Neural Network-based Information Extraction Systems for Brain Radiology Reports
Andreas Grivas | Beatrice Alex | Claire Grover | Richard Tobin | William Whiteley
Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis

We present an in-depth comparison of three clinical information extraction (IE) systems designed to perform entity recognition and negation detection on brain imaging reports: EdIE-R, a bespoke rule-based system, and two neural network models, EdIE-BiLSTM and EdIE-BERT, both multi-task learning models with a BiLSTM and BERT encoder respectively. We compare our models both on an in-sample and an out-of-sample dataset containing mentions of stroke findings and draw on our error analysis to suggest improvements for effective annotation when building clinical NLP models for a new domain. Our analysis finds that our rule-based system outperforms the neural models on both datasets and seems to generalise to the out-of-sample dataset. On the other hand, the neural models do not generalise negation to the out-of-sample dataset, despite metrics on the in-sample dataset suggesting otherwise.

pdf bib abs
Geoparsing the historical Gazetteers of Scotland: accurately computing location in mass digitised texts
Rosa Filgueira | Claire Grover | Melissa Terras | Beatrice Alex
Proceedings of the 8th Workshop on Challenges in the Management of Large Corpora

This paper describes work in progress on devising automatic and parallel methods for geoparsing large digital historical textual data by combining the strengths of three natural language processing (NLP) tools, the Edinburgh Geoparser, spaCy and defoe, and employing different tokenisation and named entity recognition (NER) techniques. We apply these tools to a large collection of nineteenth century Scottish geographical dictionaries, and describe preliminary results obtained when processing this data.

pdf bib abs
Situated Data, Situated Systems: A Methodology to Engage with Power Relations in Natural Language Processing Research
Lucy Havens | Melissa Terras | Benjamin Bach | Beatrice Alex
Proceedings of the Second Workshop on Gender Bias in Natural Language Processing

We propose a bias-aware methodology to engage with power relations in natural language processing (NLP) research. NLP research rarely engages with bias in social contexts, limiting its ability to mitigate bias. While researchers have recommended actions, technical methods, and documentation practices, no methodology exists to integrate critical reflections on bias with technical NLP methods. In this paper, after an extensive and interdisciplinary literature review, we contribute a bias-aware methodology for NLP research. We also contribute a definition of biased text, a discussion of the implications of biased NLP systems, and a case study demonstrating how we are executing the bias-aware methodology in research on archival metadata descriptions.

Twitter-related studies often need to geo-locate Tweets or Twitter users, identifying their real-world geographic locations. As tweet-level geotagging remains rare, most prior work exploited tweet content, timezone and network information to inform geolocation, or else relied on off-the-shelf tools to geolocate users from location information in their user profiles. However, such user location metadata is not consistently structured, causing such tools to fail regularly, especially if a string contains multiple locations, or if locations are very fine-grained. We argue that user profile location (UPL) and tweet location need to be treated as distinct types of information from which differing inferences can be drawn. Here, we apply geoparsing to UPLs, and demonstrate how task performance can be improved by adapting our Edinburgh Geoparser, which was originally developed for processing English text. We present a detailed evaluation method and results, including inter-coder agreement. We demonstrate that the optimised geoparser can effectively extract and geo-reference multiple locations at different levels of granularity with an F1-score of around 0.90. We also illustrate how geoparsed UPLs can be exploited for international information trade studies and country-level sentiment analysis.

2015

pdf bib
Proceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH)
Kalliopi Zervanou | Marieke van Erp | Beatrice Alex
Proceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH)

2014

pdf bib
Bootstrapping a historical commodities lexicon with SKOS and DBpedia
Ewan Klein | Beatrice Alex | Jim Clifford
Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH)

pdf bib
A Web-based Geo-resolution Annotation and Evaluation Tool
Beatrice Alex | Kate Byrne | Claire Grover | Richard Tobin
Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop

2010

pdf bib
Edinburgh-LTG: TempEval-2 System Description
Claire Grover | Richard Tobin | Beatrice Alex | Kate Byrne
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
Labelling and Spatio-Temporal Grounding of News Events
Bea Alex | Claire Grover
Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media

pdf bib
Agile Corpus Annotation in Practice: An Overview of Manual and Automatic Annotation of CVs
Bea Alex | Claire Grover | Rongzhou Shen | Mijail Kabadjov
Proceedings of the Fourth Linguistic Annotation Workshop

2008

pdf bib abs
Exploiting Multiply Annotated Corpora in Biomedical Information Extraction Tasks
Barry Haddow | Beatrice Alex
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper discusses the problem of utilising multiply annotated data in training biomedical information extraction systems. Two corpora, annotated with entities and relations, and containing a number of multiply annotated documents, are used to train named entity recognition and relation extraction systems. Several methods of automatically combining the multiple annotations to produce a single annotation are compared, but none produces better results than simply picking one of the annotated versions at random. It is also shown that adding extra singly annotated documents produces faster performance gains than adding extra multiply annotated documents.

pdf bib abs
Comparing Corpus-based to Web-based Lookup Techniques for Automatic English Inclusion Detection
Beatrice Alex
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The influence of English as a global language continues to grow to an extent that its words and expressions permeate the original forms of other languages. This paper evaluates a modular Web-based sub-component of an existing English inclusion classifier and compares it to a corpus-based lookup technique. Both approaches are evaluated on a German gold standard data set. It is demonstrated to what extent the Web-based approach benefits from the amount of data available online and the fact that this data is constantly updated.

2007

pdf bib
Using Foreign Inclusion Detection to Improve Parsing Performance
Beatrice Alex | Amit Dubey | Frank Keller
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf bib
Recognising Nested Named Entities in Biomedical Text
Beatrice Alex | Barry Haddow | Claire Grover
Biological, translational, and clinical language processing

2006

pdf bib abs
The Impact of Annotation on the Performance of Protein Tagging in Biomedical Text
Beatrice Alex | Malvina Nissim | Claire Grover
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we discuss five different corpora annotated forprotein names. We present several within- and cross-dataset proteintagging experiments showing that different annotation schemes severelyaffect the portability of statistical protein taggers. By means of adetailed error analysis we identify crucial annotation issues thatfuture annotation projects should take into careful consideration.