Ansgar Scherp

2025

pdf bib abs
HYDRA: A Multi-Head Encoder-only Architecture for Hierarchical Text Classification
Fabian Karl | Ansgar Scherp
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

We introduce HYDRA, a simple yet effective multi-head encoder-only architecture for hierarchical text classification that treats each level in the hierarchy as a separate classification task with its own label space. State-of-the-art approaches rely on complex components like graph encoders, label semantics, and autoregressive decoders. We demonstrate that such complexity is often unnecessary. Through parameter sharing and level-specific parameterization, HYDRA enables flat models to incorporate hierarchical awareness without architectural complexity. Experiments on four benchmarks (NYT, RCV1-V2, BGC, and WOS) demonstrate that HYDRA always increases the performance over flat models and matches or exceeds the performance of complex state-of-the-art methods.

pdf bib
Efficient Continual Learning for Small Language Models with a Discrete Key-Value Bottleneck
Andor Diera | Lukas Galke | Fabian Karl | Ansgar Scherp
Proceedings of the 8th International Conference on Natural Language and Speech Processing (ICNLSP-2025)

2023

pdf bib abs
GenCodeSearchNet: A Benchmark Test Suite for Evaluating Generalization in Programming Language Understanding
Andor Diera | Abdelhalim Dahou | Lukas Galke | Fabian Karl | Florian Sihler | Ansgar Scherp
Proceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP

Language models can serve as a valuable tool for software developers to increase productivity. Large generative models can be used for code generation and code completion, while smaller encoder-only models are capable of performing code search tasks using natural language queries. These capabilities are heavily influenced by the quality and diversity of the available training data. Source code datasets used for training usually focus on the most popular languages and testing is mostly conducted on the same distributions, often overlooking low-resource programming languages. Motivated by the NLP generalization taxonomy proposed by Hupkes et.,al., we propose a new benchmark dataset called GenCodeSearchNet (GeCS) which builds upon existing natural language code search datasets to systemically evaluate the programming language understanding generalization capabilities of language models. As part of the full dataset, we introduce a new, manually curated subset StatCodeSearch that focuses on R, a popular but so far underrepresented programming language that is often used by researchers outside the field of computer science. For evaluation and comparison, we collect several baseline results using fine-tuned BERT-style models and GPT-style large language models in a zero-shot setting.

2022

pdf bib abs
Bag-of-Words vs. Graph vs. Sequence in Text Classification: Questioning the Necessity of Text-Graphs and the Surprising Strength of a Wide MLP
Lukas Galke | Ansgar Scherp
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Graph neural networks have triggered a resurgence of graph-based text classification methods, defining today’s state of the art. We show that a wide multi-layer perceptron (MLP) using a Bag-of-Words (BoW) outperforms the recent graph-based models TextGCN and HeteGCN in an inductive text classification setting and is comparable with HyperGAT. Moreover, we fine-tune a sequence-based BERT and a lightweight DistilBERT model, which both outperform all state-of-the-art models. These results question the importance of synthetic graphs used in modern text classifiers. In terms of efficiency, DistilBERT is still twice as large as our BoW-based wide MLP, while graph-based models like TextGCN require setting up an 𝒪(N²) graph, where N is the vocabulary plus corpus size. Finally, since Transformers need to compute 𝒪(L²) attention weights with sequence length L, the MLP models show higher training and inference speeds on datasets with long sequences.

Co-authors

Venues

Fix author