Bartosz Rudnikowicz
2025
Domain Adapted Text Summarization with Self-Generated Guidelines
Andrianos Michail
|
Bartosz Rudnikowicz
|
Pavlos Fragkogiannis
|
Cristina Kadar
Proceedings of the Natural Legal Language Processing Workshop 2025
Text summarization systems face significant adaptation costs when deployed across diverse domains, requiring expensive few-shot learning or manual prompt engineering. We propose a cost-effective domain adaptation framework that generates reusable summarization guidelines using only two reference summaries and three LLM inferences. Our approach works by having the model compare its own generated summaries against domain specific reference summaries in a one time preparation step that derives concise natural language guidelines that capture the summarization patterns of the target domain. These guidelines are then appended to the summarization prompt to adapt the LLM to the target domain at a minimal cost. We evaluate our method across diverse model sizes on three distinct summarization domains: Lawsuits, ArXiv papers, and Patents. Automatic metrics show that guideline-based adaptation achieves comparable or superior performance compared to in-context learning and zero-shot baselines. An LLM preference evaluation using the latest models shows that summaries generated using such guidelines are superior to the zero-shot or in-context learning summarization prompts. Our method enables efficient domain adaptation of text summarizer LLMs with a minimal resource overhead, making specialized summarization particularly accessible for agentic systems that require to process heterogeneous texts in enterprise environments.
2024
LLM-Based Robust Product Classification in Commerce and Compliance
Sina Gholamian
|
Gianfranco Romani
|
Bartosz Rudnikowicz
|
Stavroula Skylaki
Proceedings of the 1st Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U)
Product classification is a crucial task in international trade, as compliance regulations are verified and taxes and duties are applied based on product categories. Manual classification of products is time-consuming and error-prone, and the sheer volume of products imported and exported renders the manual process infeasible. Consequently, e-commerce platforms and enterprises involved in international trade have turned to automatic product classification using machine learning. However, current approaches do not consider the real-world challenges associated with product classification, such as very abbreviated and incomplete product descriptions. In addition, recent advancements in generative Large Language Models (LLMs) and their reasoning capabilities are mainly untapped in product classification and e-commerce. In this research, we explore the real-life challenges of industrial classification and we propose data perturbations that allow for realistic data simulation. Furthermore, we employ LLM-based product classification to improve the robustness of the prediction in presence of incomplete data. Our research shows that LLMs with in-context learning outperform the supervised approaches in the clean-data scenario. Additionally, we illustrate that LLMs are significantly more robust than the supervised approaches when data attacks are present.
Search
Fix author
Co-authors
- Pavlos Fragkogiannis 1
- Sina Gholamian 1
- Cristina Kadar 1
- Andrianos Michail 1
- Gianfranco Romani 1
- show all...