Sabine Bergler

2024

pdf abs
Analysis of Annotator Demographics in Sexism Detection
Narjes Tahaei | Sabine Bergler
Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

This study explores the effect of annotators’ demographic features on labeling sexist content in social media datasets, specifically focusing on the EXIST dataset, which includes direct sexist messages, reports and descriptions of sexist experiences and stereotypes. We investigate how various demographic backgrounds influence annotation outcomes and examine methods to incorporate these features into BERT-based model training. Our experiments demonstrate that adding demographic information improves performance in detecting sexism and assessing intention of the author.

2023

pdf abs
Comparing and combining some popular NER approaches on Biomedical tasks
Harsh Verma | Sabine Bergler | Narjesossadat Tahaei
The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks

We compare three simple and popular approaches for NER: 1) SEQ (sequence labeling with a linear token classifier) 2) SeqCRF (sequence labeling with Conditional Random Fields), and 3) SpanPred (span prediction with boundary token embeddings). We compare the approaches on 4 biomedical NER tasks: GENIA, NCBI-Disease, LivingNER (Spanish), and SocialDisNER (Spanish). The SpanPred model demonstrates state-of-the-art performance on LivingNER and SocialDisNER, improving F1 by 1.3 and 0.6 F1 respectively. The SeqCRF model also demonstrates state-of-the-art performance on LivingNER and SocialDisNER, improving F1 by 0.2 F1 and 0.7 respectively. The SEQ model is competitive with the state-of-the-art on LivingNER dataset. We explore some simple ways of combining the three approaches. We find that majority voting consistently gives high precision and high F1 across all 4 datasets. Lastly, we implement a system that learns to combine SEQ’s and SpanPred’s predictions, generating systems that give high recall and high F1 across all 4 datasets. On the GENIA dataset, we find that our learned combiner system significantly boosts F1(+1.2) and recall(+2.1) over the systems being combined.

pdf abs
CLaC at SemEval-2023 Task 2: Comparing Span-Prediction and Sequence-Labeling Approaches for NER
Harsh Verma | Sabine Bergler
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper summarizes the CLaC submission for the MultiCoNER 2 task which concerns the recognition of complex, fine-grained named entities. We compare two popular approaches for NER, namely SequenceLabeling and Span Prediction. We find that our best Span Prediction system performs slightly better than our best Sequence Labeling system on test data. Moreover, we find that using the larger version of XLM RoBERTa significantly improves performance. Post-competition experiments show that Span Prediction and Sequence Labeling approaches improve when they use special input tokens ([s] and [/s]) of XLM-RoBERTa. The code for training all models, preprocessing, and post-processing is available at https://github.com/harshshredding/semeval2023-multiconer-paper.

2022

pdf abs
CLaCLab at SocialDisNER: Using Medical Gazetteers for Named-Entity Recognition of Disease Mentions in Spanish Tweets
Harsh Verma | Parsa Bagherzadeh | Sabine Bergler
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task

This paper summarizes the CLaC submission for SMM4H 2022 Task 10 which concerns the recognition of diseases mentioned in Spanish tweets. Before classifying each token, we encode each token with a transformer encoder using features from Multilingual RoBERTa Large, UMLS gazetteer, and DISTEMIST gazetteer, among others. We obtain a strict F1 score of 0.869, with competition mean of 0.675, standard deviation of 0.245, and median of 0.761.

pdf abs
Integration of Heterogeneous Knowledge Sources for Biomedical Text Processing
Parsa Bagherzadeh | Sabine Bergler
Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI)

Recently, research into bringing outside knowledge sources into current neural NLP models has been increasing. Most approaches that leverage external knowledge sources require laborious and non-trivial designs, as well as tailoring the system through intensive ablation of different knowledge sources, an effort that discourages users to use quality ontological resources. In this paper, we show that multiple large heterogeneous KSs can be easily integrated using a decoupled approach, allowing for an automatic ablation of irrelevant KSs, while keeping the overall parameter space tractable. We experiment with BERT and pre-trained graph embeddings, and show that they interoperate well without performance degradation, even when some do not contribute to the task.

2021

pdf abs
Leveraging knowledge sources for detecting self-reports of particular health issues on social media
Parsa Bagherzadeh | Sabine Bergler
Proceedings of the 12th International Workshop on Health Text Mining and Information Analysis

This paper investigates incorporating quality knowledge sources developed by experts for the medical domain as well as syntactic information for classification of tweets into four different health oriented categories. We claim that resources such as the MeSH hierarchy and currently available parse information are effective extensions of moderately sized training datasets for various fine-grained tweet classification tasks of self-reported health issues.

pdf abs
CLaC-np at SemEval-2021 Task 8: Dependency DGCNN
Nihatha Lathiff | Pavel PK Khloponin | Sabine Bergler
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

MeasEval aims at identifying quantities along with the entities that are measured with additional properties within English scientific documents. The variety of styles used makes measurements, a most crucial aspect of scientific writing, challenging to extract. This paper presents ablation studies making the case for several preprocessing steps such as specialized tokenization rules. For linguistic structure, we encode dependency trees in a Deep Graph Convolution Network (DGCNN) for multi-task classification.

pdf abs
CLaC-BP at SemEval-2021 Task 8: SciBERT Plus Rules for MeasEval
Benjamin Therien | Parsa Bagherzadeh | Sabine Bergler
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper explains the design of a heterogeneous system that ranked eighth in competition in SemEval2021 Task 8. We analyze ablation experiments and demonstrate how the system components, namely tokenizer, unit identifier, modifier classifier, and language model, affect the overall score. We compare our results to similar experiments from the literature and introduce a grouping algorithm developed in the post-evaluation phase that increased our system’s overall score, hypothetically elevating our competition rank from eight to six.

pdf abs
Multi-input Recurrent Independent Mechanisms for leveraging knowledge sources: Case studies on sentiment analysis and health text mining
Parsa Bagherzadeh | Sabine Bergler
Proceedings of Deep Learning Inside Out (DeeLIO): The 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures

This paper presents a way to inject and leverage existing knowledge from external sources in a Deep Learning environment, extending the recently proposed Recurrent Independent Mechnisms (RIMs) architecture, which comprises a set of interacting yet independent modules. We show that this extension of the RIMs architecture is an effective framework with lower parameter implications compared to purely fine-tuned systems.

pdf abs
Interacting Knowledge Sources, Inspection and Analysis: Case-studies on Biomedical text processing
Parsa Bagherzadeh | Sabine Bergler
Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

In this paper we investigate the recently proposed multi-input RIM for inspectability. This framework follows an encapsulation paradigm, where external knowledge sources are encoded as largely independent modules, enabling transparency for model inspection.

pdf abs
Competing Independent Modules for Knowledge Integration and Optimization
Parsa Bagherzadeh | Sabine Bergler
Findings of the Association for Computational Linguistics: EMNLP 2021

This paper presents a neural framework of untied independent modules, used here for integrating off the shelf knowledge sources such as language models, lexica, POS information, and dependency relations. Each knowledge source is implemented as an independent component that can interact and share information with other knowledge sources. We report proof of concept experiments for several standard sentiment analysis tasks and show that the knowledge sources interoperate effectively without interference. As a second use-case, we show that the proposed framework is suitable for optimizing BERT-like language models even without the help of external knowledge sources. We cast each Transformer layer as a separate module and demonstrate performance improvements from this explicit integration of the different information encoded at the different Transformer layers .

2020

pdf abs
CLaC at SMM4H 2020: Birth Defect Mention Detection
Parsa Bagherzadeh | Sabine Bergler
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task

For the detection of personal tweets, where a parent speaks of a child’s birth defect, CLaC combines ELMo word embeddings and gazetteer lists from external resources with a GCNN (for encoding dependencies), in a multi layer, transformer inspired architecture. To address the task, we compile several gazetteer lists from resources such as MeSH and GI. The proposed system obtains .69 for μF1 score in the SMM4H 2020 Task 5 where the competition average is .65.

pdf abs
CLaC at SemEval-2020 Task 5: Muli-task Stacked Bi-LSTMs
MinGyou Sung | Parsa Bagherzadeh | Sabine Bergler
Proceedings of the Fourteenth Workshop on Semantic Evaluation

We consider detection of the span of antecedents and consequents in argumentative prose a structural, grammatical task. Our system comprises a set of stacked Bi-LSTMs trained on two complementary linguistic annotations. We explore the effectiveness of grammatical features (POS and clause type) through ablation. The reported experiments suggest that a multi-task learning approach using this external, grammatical knowledge is useful for detecting the extent of antecedents and consequents and performs nearly as well without the use of word embeddings.

2019

pdf abs
Adverse Drug Effect and Personalized Health Mentions, CLaC at SMM4H 2019, Tasks 1 and 4
Parsa Bagherzadeh | Nadia Sheikh | Sabine Bergler
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task

CLaC labs participated in Task 1 and 4 of SMM4H 2019. We pursed two main objectives in our submission. First we tried to use some textual features in a deep net framework, and second, the potential use of more than one word embedding was tested. The results seem positively affected by the proposed architectures.

2018

pdf abs
CLaC at SMM4H Task 1, 2, and 4
Parsa Bagherzadeh | Nadia Sheikh | Sabine Bergler
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task

CLaC Labs participated in Tasks 1, 2, and 4 using the same base architecture for all tasks with various parameter variations. This was our first exploration of this data and the SMM4H Tasks, thus a unified system was useful to compare the behavior of our architecture over the different datasets and how they interact with different linguistic features.

Reported speech in the form of direct and indirect reported speech is an important indicator of evidentiality in traditional newspaper texts, but also increasingly in the new media that rely heavily on citation and quotation of previous postings, as for instance in blogs or newsgroups. This paper details the basic processing steps for reported speech analysis and reports on performance of an implementation in form of a GATE resource.

pdf
Recognizing Speculative Language in Biomedical Research Articles: A Linguistically Motivated Perspective
Halil Kilicoglu | Sabine Bergler
Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing

pdf
When Specialists and Generalists Work Together: Overcoming Domain Dependence in Sentiment Tagging
Alina Andreevskaia | Sabine Bergler
Proceedings of ACL-08: HLT

2007

pdf
CLaC and CLaC-NB: Knowledge-based and corpus-based approaches to sentiment tagging
Alina Andreevskaia | Sabine Bergler
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

2006

pdf
Postnominal Prepositional Phrase Attachment in Proteomics
Jonathan Schuman | Sabine Bergler
Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology

pdf
BioKI:Enzymes - an adaptable system to locate low-frequency information in full-text proteomics articles
Sabine Bergler | Jonathan Schuman | Julien Dubuc | Alexandr Lebedev
Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology

pdf
Mining WordNet for a Fuzzy Sentiment: Sentiment Tag Extraction from WordNet Glosses
Alina Andreevskaia | Sabine Bergler
11th Conference of the European Chapter of the Association for Computational Linguistics

pdf abs
Semantic Tag Extraction from WordNet Glosses
Alina Andreevskaia | Sabine Bergler
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We propose a method that uses information from WordNet glosses to assign semantic tags to individual word meanings, rather than to entire words. The produced lists of annotated words will be used in sentiment annotation of texts and phrases and in other NLP tasks. The method was implemented in the Semantic Tag Extraction Program (STEP) and evaluated on the category of sentiment (positive, negative or neutral) using two human-annotated lists. The lists were first compared to each other and then used to assess the accuracy of the proposed system. We argue that significant disagreement on sentiment tags between the two human-annotated lists reflects a naturally occurring ambiguity of words located on the periphery of the category of sentiment. The category of sentiment, thus, is believed to be structured as a fuzzy set. Finally, we evaluate the generalizability of STEP to other semantic categories on the example of the category of words denoting increase/decrease in magnitude, intensity or quality of some state or process. The implications of this study for both semantic tagging system development and for performance evaluation practices are discussed.