Bernhard Pfahringer


2025

pdf bib
Detection of Human and Machine-Authored Fake News in Urdu
Muhammad Zain Ali | Yuxia Wang | Bernhard Pfahringer | Tony C Smith
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The rise of social media has amplified the spread of fake news, now further complicated by large language models (LLMs) like ChatGPT, which ease the generation of highly convincing, error-free misinformation, making it increasingly challenging for the public to discern truth from falsehood. Traditional fake news detection methods relying on linguistic cues have also become less effective. Moreover, current detectors primarily focus on binary classification and English texts, often overlooking the distinction between machine-generated true vs. fake news and the detection in low-resource languages. To this end, we updated the detection schema to include machine-generated news focusing on Urdu. We further propose a conjoint detection strategy to improve the accuracy and robustness. Experiments show its effectiveness across four datasets in various settings.

2021

pdf bib
PolyLM: Learning about Polysemy through Language Modeling
Alan Ansell | Felipe Bravo-Marquez | Bernhard Pfahringer
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

To avoid the “meaning conflation deficiency” of word embeddings, a number of models have aimed to embed individual word senses. These methods at one time performed well on tasks such as word sense induction (WSI), but they have since been overtaken by task-specific techniques which exploit contextualized embeddings. However, sense embeddings and contextualization need not be mutually exclusive. We introduce PolyLM, a method which formulates the task of learning sense embeddings as a language modeling problem, allowing contextualization techniques to be applied. PolyLM is based on two underlying assumptions about word senses: firstly, that the probability of a word occurring in a given context is equal to the sum of the probabilities of its individual senses occurring; and secondly, that for a given occurrence of a word, one of its senses tends to be much more plausible in the context than the others. We evaluate PolyLM on WSI, showing that it performs considerably better than previous sense embedding techniques, and matches the current state-of-the-art specialized WSI method despite having six times fewer parameters. Code and pre-trained models are available at https://github.com/AlanAnsell/PolyLM.

2019

pdf bib
An ELMo-inspired approach to SemDeep-5’s Word-in-Context task
Alan Ansell | Felipe Bravo-Marquez | Bernhard Pfahringer
Proceedings of the 5th Workshop on Semantic Deep Learning (SemDeep-5)