Nils Constantin Hellwig
2026
AnnoABSA: A Web-Based Annotation Tool for Aspect-Based Sentiment Analysis with Retrieval-Augmented Suggestions
Nils Constantin Hellwig | Jakob Fehle | Udo Kruschwitz | Christian Wolff
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Nils Constantin Hellwig | Jakob Fehle | Udo Kruschwitz | Christian Wolff
Proceedings of the Fifteenth Language Resources and Evaluation Conference
We introduce AnnoABSA, the first web-based annotation tool to support the full spectrum of Aspect-Based Sentiment Analysis (ABSA) tasks. The tool is highly customizable, enabling flexible configuration of sentiment elements and task-specific requirements. Alongside manual annotation, AnnoABSA provides optional Large Language Model (LLM)-based retrieval-augmented generation (RAG) suggestions that offer context-aware assistance in a human-in-the-loop approach, keeping the human annotator in control. To improve prediction quality over time, the system retrieves the ten most similar examples that are already annotated and adds them as few-shot examples in the prompt, ensuring that suggestions become increasingly accurate as the annotation process progresses. Released as open-source software under the MIT License, AnnoABSA is freely accessible and easily extendable for research and practical applications.
Zero-Shot to Full-Resource: Cross-lingual Transfer Strategies for Aspect-Based Sentiment Analysis
Jakob Fehle | Nils Constantin Hellwig | Udo Kruschwitz | Christian Wolff
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Jakob Fehle | Nils Constantin Hellwig | Udo Kruschwitz | Christian Wolff
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Aspect-based Sentiment Analysis (ABSA) extracts fine-grained opinions toward specific aspects within text but remains largely English-focused despite major advances in transformer-based and instruction-tuned models. This work presents a multilingual evaluation of state-of-the-art ABSA approaches across seven languages and four subtasks (ACD, ACSA, TASD, ASQP). We systematically compare different transformer architectures under zero-resource, data-only, and full-resource settings, using cross-lingual transfer, code-switching and machine translation. Fine-tuned Large Language Models (LLMs) achieve the highest overall scores, particularly in complex generative tasks, while few-shot counterparts approach this performance in simpler setups, where smaller encoder models also remain competitive. Cross-lingual training on multiple non-target languages yields the strongest transfer for fine-tuned LLMs, while smaller encoder or seq-to-seq models benefit most from code-switching, highlighting architecture-specific strategies for multilingual ABSA. We further contribute two new German datasets, an adapted GERestaurant and the first German ASQP dataset (GERest), to encourage multilingual ABSA research beyond English.
LLM-as-an-Annotator: Training Lightweight Models with LLM-Annotated Examples for Aspect Sentiment Tuple Prediction
Nils Constantin Hellwig | Jakob Fehle | Udo Kruschwitz | Christian Wolff
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Nils Constantin Hellwig | Jakob Fehle | Udo Kruschwitz | Christian Wolff
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Training models for Aspect-Based Sentiment Analysis (ABSA) tasks requires manually annotated data, which is expensive and time-consuming to obtain. This paper introduces LA-ABSA, a novel approach that leverages Large Language Model (LLM)-generated annotations to fine-tune lightweight models for complex ABSA tasks. We evaluate our approach on five datasets for Target Aspect Sentiment Detection (TASD) and Aspect Sentiment Quad Prediction (ASQP). Our approach outperformed previously reported augmentation strategies and achieved competitive performance with LLM-prompting in low-resource scenarios, while providing substantial energy efficiency benefits. For example, using 50 annotated examples for in-context learning (ICL) to guide the annotation of unlabeled data, LA-ABSA achieved an F1 score of 49.85 for ASQP on the SemEval Rest16 dataset, closely matching the performance of ICL prompting with Gemma-3-27B (51.10), while requiring significantly lower computational resources.
2025
German Aspect-based Sentiment Analysis in the Wild: B2B Dataset Creation and Cross-Domain Evaluation
Jakob Fehle | Niklas Donhauser | Udo Kruschwitz | Nils Constantin Hellwig | Christian Wolff
Proceedings of the 21st Conference on Natural Language Processing (KONVENS 2025): Long and Short Papers
Jakob Fehle | Niklas Donhauser | Udo Kruschwitz | Nils Constantin Hellwig | Christian Wolff
Proceedings of the 21st Conference on Natural Language Processing (KONVENS 2025): Long and Short Papers
Do we still need Human Annotators? Prompting Large Language Models for Aspect Sentiment Quad Prediction
Nils Constantin Hellwig | Jakob Fehle | Udo Kruschwitz | Christian Wolff
Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025)
Nils Constantin Hellwig | Jakob Fehle | Udo Kruschwitz | Christian Wolff
Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025)
Aspect sentiment quad prediction (ASQP) facilitates a detailed understanding of opinions expressed in a text by identifying the opinion term, aspect term, aspect category and sentiment polarity for each opinion. However, annotating a full set of training examples to fine-tune models for ASQP is a resource-intensive process. In this study, we explore the capabilities of large language models (LLMs) for zero- and few-shot learning on the ASQP task across five diverse datasets. We report F1 scores almost up to par with those obtained with state-of-the-art fine-tuned models and exceeding previously reported zero- and few-shot performance. In the 20-shot setting on the Rest16 restaurant domain dataset, LLMs achieved an F1 score of 51.54, compared to 60.39 by the best-performing fine-tuned method MVP. Additionally, we report the performance of LLMs in target aspect sentiment detection (TASD), where the F1 scores were close to fine-tuned models, achieving 68.93 on Rest16 in the 30-shot setting, compared to 72.76 with MVP. While human annotators remain essential for achieving optimal performance, LLMs can reduce the need for extensive manual annotation in ASQP tasks.
2024
GERestaurant: A German Dataset of Annotated Restaurant Reviews for Aspect-Based Sentiment Analysis
Nils Constantin Hellwig | Jakob Fehle | Markus Bink | Christian Wolff
Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024)
Nils Constantin Hellwig | Jakob Fehle | Markus Bink | Christian Wolff
Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024)
2023
Transformer-Based Analysis of Sentiment Towards German Political Parties on Twitter During the 2021 Election Year
Nils Constantin Hellwig | Markus Bink | Thomas Schmidt | Jakob Fehle | Christian Wolff
Proceedings of the 6th International Conference on Natural Language and Speech Processing (ICNLSP 2023)
Nils Constantin Hellwig | Markus Bink | Thomas Schmidt | Jakob Fehle | Christian Wolff
Proceedings of the 6th International Conference on Natural Language and Speech Processing (ICNLSP 2023)