GAVEL: Generative Attribute-Value Extraction Using LLMs on LLM-Augmented Datasets

Pollawat Hongwimol, Dong Sheng, Li Zhang, Kai Liu, Xiufei Wang


Abstract
In the evolving e-commerce landscape, accurate product attribute-value extraction is crucial for enhancing user experience and increasing sales. This paper introduces GAVEL, a generative approach leveraging large language models (LLMs) to augment training data for attribute extraction from diverse textual sources. Our method extracts over 1,000 unique attributes across 2,000 product categories in multiple Southeast Asian languages, including Thai, Vietnamese, and Indonesian. Rigorous evaluations show significant improvements in accuracy and coverage compared to seller-provided attributes, with enhanced recall and F1 scores. Additionally, GAVEL reduces operational costs by minimizing instruction token usage and improves inference speed. The results of the A/B testing indicate that our model has a positive impact on Gross Merchandise Value (GMV) per page view (PV) across all three operating countries. This research highlights the potential of generative techniques for optimizing attribute extraction in multi-language e-commerce applications.
Anthology ID:
2025.knowledgenlp-1.6
Volume:
Proceedings of the 4th International Workshop on Knowledge-Augmented Methods for Natural Language Processing
Month:
May
Year:
2025
Address:
Albuquerque, New Mexico, USA
Editors:
Weijia Shi, Wenhao Yu, Akari Asai, Meng Jiang, Greg Durrett, Hannaneh Hajishirzi, Luke Zettlemoyer
Venues:
KnowledgeNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
81–90
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.knowledgenlp-1.6/
DOI:
Bibkey:
Cite (ACL):
Pollawat Hongwimol, Dong Sheng, Li Zhang, Kai Liu, and Xiufei Wang. 2025. GAVEL: Generative Attribute-Value Extraction Using LLMs on LLM-Augmented Datasets. In Proceedings of the 4th International Workshop on Knowledge-Augmented Methods for Natural Language Processing, pages 81–90, Albuquerque, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
GAVEL: Generative Attribute-Value Extraction Using LLMs on LLM-Augmented Datasets (Hongwimol et al., KnowledgeNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.knowledgenlp-1.6.pdf