Roman Teucher
2026
Fact Finder - Enhancing Domain Expertise of Large Language Models by Incorporating Knowledge Graphs
Daniel Steinigen | Roman Teucher | Timm Heine Ruland | Max Rudat | Nicolas Flores-Herr | Peter Fischer | Nikola Milosevic | Christopher Schymura | Angelo Ziletti
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Daniel Steinigen | Roman Teucher | Timm Heine Ruland | Max Rudat | Nicolas Flores-Herr | Peter Fischer | Nikola Milosevic | Christopher Schymura | Angelo Ziletti
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Recent advancements in Large Language Models (LLMs) have showcased their proficiency in answering natural language queries. However, their effectiveness is hindered by limited domain-specific knowledge, raising concerns about the reliability of their responses. We introduce a hybrid system that augments LLMs with domain-specific knowledge graphs (KGs), thereby aiming to enhance factual correctness using a KG-based retrieval approach. We focus on a medical KG to demonstrate our methodology, which includes (1) pre-processing, (2) Cypher query generation, (3) Cypher query processing, (4) KG retrieval, and (5) LLM-enhanced response generation. We evaluate our system on a curated dataset of 69 samples, achieving a precision of 78% in retrieving correct KG nodes. Our findings indicate that the hybrid system surpasses a standalone LLM in accuracy and completeness, as verified by an LLM-as-a-Judge evaluation method. This positions the system as a promising tool for applications that demand factual correctness and completeness, such as target identification — a critical process in pinpointing biological entities for disease treatment or crop enhancement. Moreover, its intuitive search interface and ability to provide accurate responses within seconds make it well-suited for time-sensitive, precision-focused research contexts. We publish the source code together with the dataset and the prompt templates used.
DeepICD-R1: Medical Reasoning through Hierarchical Rewards and Unsupervised Distillation
Tom Röhr | Thomas Maximilian Josef Steffek | Roman Teucher | Keno Bressem | Alexei Figueroa | Paul Grundmann | Peter Troeger | Felix Alexander Gers | Alexander Löser
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Tom Röhr | Thomas Maximilian Josef Steffek | Roman Teucher | Keno Bressem | Alexei Figueroa | Paul Grundmann | Peter Troeger | Felix Alexander Gers | Alexander Löser
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Large language models (LLMs) show strong reasoning abilities, but full retraining for the medical domain is often infeasible because of lacking data or compute resources. We present DeepICD-R1, a framework for efficient medical reasoning fine-tuning that unites hierarchical rewards with distilled supervision. We reformulate ICD-10-CM prediction as a reinforcement learning problem and design a hierarchical outcome-based reward that reflects the ICD code structure across chapter, category, and full-code levels. In parallel, we publish a large-scale distilled dataset of over 90k reasoning traces derived from MIMIC-IV admission notes, integrating clinical validation and official coding guidelines. Fine-tuning smaller instruction-tuned LLMs with this data and GRPO reinforcement yields consistent gains in diagnostic accuracy and reasoning coherence. Extensive ablations confirm that hierarchical supervision and verifiable outcome rewards enable competitive, domain-specialized reasoning models without additional pretraining, providing a reproducible foundation for clinical NLP research. Keywords: Clinical NLP, Large Reasoning Model, GRPO, Supervised Fine-Tuning
2023
CarExpert: Leveraging Large Language Models for In-Car Conversational Question Answering
Md Rashad Al Hasan Rony | Christian Suess | Sinchana Ramakanth Bhat | Viju Sudhi | Julia Schneider | Maximilian Vogel | Roman Teucher | Ken Friedl | Soumya Sahoo
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
Md Rashad Al Hasan Rony | Christian Suess | Sinchana Ramakanth Bhat | Viju Sudhi | Julia Schneider | Maximilian Vogel | Roman Teucher | Ken Friedl | Soumya Sahoo
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
Large language models (LLMs) have demonstrated remarkable performance by following natural language instructions without fine-tuning them on domain-specific tasks and data. However, leveraging LLMs for domain-specific question answering suffers from severe limitations. The generated answer tends to hallucinate due to the training data collection time (when using off-the-shelf), complex user utterance and wrong retrieval (in retrieval-augmented generation). Furthermore, due to the lack of awareness about the domain and expected output, such LLMs may generate unexpected and unsafe answers that are not tailored to the target domain. In this paper, we propose CarExpert, an in-car retrieval-augmented conversational question-answering system leveraging LLMs for different tasks. Specifically, CarExpert employs LLMs to control the input, provide domain-specific documents to the extractive and generative answering components, and controls the output to ensure safe and domain-specific answers. A comprehensive empirical evaluation exhibits that CarExpert outperforms state-of-the-art LLMs in generating natural, safe and car-specific answers.
2022
TextGraphs-16 Natural Language Premise Selection Task: Zero-Shot Premise Selection with Prompting Generative Language Models
Liubov Kovriguina | Roman Teucher | Robert Wardenga
Proceedings of TextGraphs-16: Graph-based Methods for Natural Language Processing
Liubov Kovriguina | Roman Teucher | Robert Wardenga
Proceedings of TextGraphs-16: Graph-based Methods for Natural Language Processing
Automated theorem proving can benefit a lot from methods employed in natural language processing, knowledge graphs and information retrieval: this non-trivial task combines formal languages understanding, reasoning, similarity search. We tackle this task by enhancing semantic similarity ranking with prompt engineering, which has become a new paradigm in natural language understanding. None of our approaches requires additional training. Despite encouraging results reported by prompt engineering approaches for a range of NLP tasks, for the premise selection task vanilla re-ranking by prompting GPT-3 doesn’t outperform semantic similarity ranking with SBERT, but merging of the both rankings shows better results.
Search
Fix author
Co-authors
- Sinchana Ramakanth Bhat 1
- Keno Bressem 1
- Alexei Figueroa 1
- Peter Fischer 1
- Nicolas Flores-Herr 1
- Ken Friedl 1
- Felix Gers 1
- Paul Grundmann 1
- Liubov Kovriguina 1
- Alexander Löser 1
- Nikola Milosevic 1
- Md Rashad Al Hasan Rony 1
- Max Rudat 1
- Timm Heine Ruland 1
- Tom Röhr 1
- Soumya Sahoo 1
- Julia Schneider 1
- Christopher Schymura 1
- Thomas Maximilian Josef Steffek 1
- Daniel Steinigen 1
- Viju Sudhi 1
- Christian Suess 1
- Peter Troeger 1
- Maximilian Vogel 1
- Robert Wardenga 1
- Angelo Ziletti 1