Bettina Bert

2025

We organized the SMAFIRA Shared in the scope of the BioNLP’2025 Workshop. Given two articles, our goal was to collect annotations about the similarity of their research goal. The test sets consisted of a list of reference articles and their corresponding top 20 similar articles from PubMed. The task consisted in annotating the similar articles regarding the similarity of their research goal with respect to the one from the corresponding reference article. The assessment of the similarity was based on three labels: "“similar”", "“uncertain”", or "“not similar”". We released two batches of test sets: (a) a first batch of 25 reference articles for five diseases; and (b) a second batch of 80 reference articles for 16 diseases. We collected manual annotations from two teams (RCX and Bf3R) and automatic predictions from two large language models (GPT-4omini and Llama3.3). The preliminary evaluation showed a rather low agreement between the annotators, however, some pairs could potentially be part of a future dataset.

2023

pdf bib abs
Is the ranking of PubMed similar articles good enough? An evaluation of text similarity methods for three datasets
Mariana Neves | Ines Schadock | Beryl Eusemann | Gilbert Schnfelder | Bettina Bert | Daniel Butzke
Proceedings of the 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks

The use of seed articles in information retrieval provides many advantages, such as a longercontext and more details about the topic being searched for. Given a seed article (i.e., a PMID), PubMed provides a pre-compiled list of similar articles to support the user in finding equivalent papers in the biomedical literature. We aimed at performing a quantitative evaluation of the PubMed Similar Articles based on three existing biomedical text similarity datasets, namely, RELISH, TREC-COVID, and SMAFIRA-c. Further, we carried out a survey and an evaluation of various text similarity methods on these three datasets. Our experiments considered the original title and abstract from PubMed as well as automatically detected sections and manually annotated relevant sentences. We provide an overview about which methods better performfor each dataset and compare them to the ranking in PubMed similar articles. While resultsvaried considerably among the datasets, we were able to obtain a better performance thanPubMed for all of them. Datasets and source codes are available at: https://github.com/mariananeves/reranking

Co-authors

Diana Rubel 1

Ines Schadock 1

Gilbert Schnfelder 1

Gilbert Schönfelder 1

Iva Sovadinova 1

Venues

bionlp2
ws1

Fix author