Abstract
This paper describes a novel framework to estimate the data quality of a collection of product descriptions to identify required relevant information for accurate product listing classification for tax-code assignment. Our Data Quality Estimation (DQE) framework consists of a Question Answering (QA) based attribute value extraction model to identify missing attributes and a classification model to identify bad quality records. We show that our framework can accurately predict the quality of product descriptions. In addition to identifying low-quality product listings, our framework can also generate a detailed report at a category level showing missing product information resulting in a better customer experience.- Anthology ID:
- 2022.ecnlp-1.4
- Volume:
- Proceedings of the Fifth Workshop on e-Commerce and NLP (ECNLP 5)
- Month:
- May
- Year:
- 2022
- Address:
- Dublin, Ireland
- Editors:
- Shervin Malmasi, Oleg Rokhlenko, Nicola Ueffing, Ido Guy, Eugene Agichtein, Surya Kallumadi
- Venue:
- ECNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 29–34
- Language:
- URL:
- https://aclanthology.org/2022.ecnlp-1.4
- DOI:
- 10.18653/v1/2022.ecnlp-1.4
- Cite (ACL):
- Ravi Kondadadi, Allen Williams, and Nicolas Nicolov. 2022. Data Quality Estimation Framework for Faster Tax Code Classification. In Proceedings of the Fifth Workshop on e-Commerce and NLP (ECNLP 5), pages 29–34, Dublin, Ireland. Association for Computational Linguistics.
- Cite (Informal):
- Data Quality Estimation Framework for Faster Tax Code Classification (Kondadadi et al., ECNLP 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2022.ecnlp-1.4.pdf