Tijl De Bie

Also published as: Tijl de Bie, Tijl De Bie


2026

Humor is one of the most culturally embedded and socially significant dimensions of human communication, yet it remains largely unexplored as a dimension of Large Language Model (LLM) alignment. In this study, five frontier language models play the same Cards Against Humanity games (CAH) as human players. The models select the funniest response from a slate of ten candidate cards across 9,894 rounds. While all models exceed the random baseline, alignment with human preference remains modest. More striking is that models agree with each other substantially more often than they agree with humans. We show that this preference is partly explained by systematic position biases and content preferences, raising the question whether LLM humor judgment reflects genuine preference or structural artifacts of inference and alignment.

2025

As Large Language Models (LLMs) are deployed in every aspect of our lives, understanding how they reason about moral issues becomes critical for AI safety. We investigate this using a dataset we curated from Reddit’s r/AmItheAsshole, comprising real-world moral dilemmas with crowd-sourced verdicts. Through experiments on five state-of-the-art LLMs across 847 posts, we find a significant and systematic divergence where LLMs are more lenient than humans. Moreover, we find that translating the posts into another language changes LLMs’ verdicts, indicating their judgments lack cross-lingual stability.
Creating robust occupation taxonomies, vital for applications ranging from job recommendation to labor market intelligence, is challenging.Manual curation is slow, while existing automated methods are either not adaptive to dynamic regional markets (top-down) or struggle to build coherent hierarchies from noisy data (bottom-up). We introduce CLIMB (CLusterIng-based Multi-agent taxonomy Builder), a framework that fully automates the creation of high-quality, data-driven taxonomies from raw job postings. CLIMB uses global semantic clustering to distill core occupations, then employs a reflection-based multi-agent system to iteratively build a coherent hierarchy. On three diverse, real-world datasets, we show that CLIMB produces taxonomies that are more coherent and scalable than existing methods and successfully capture unique regional characteristics. We release our code and datasets at https://github.com/aida-ugent/CLIMB.

2024

2009

2008