ConsumerBR: A Large-Scale Corpus of Consumer Complaints in Brazilian Portuguese
Luis A. Duarte, Pedro Giacomin, Vitória Bispo, Mariana O. Silva, Adriano C. M. Pereira, Gisele L. Pappa
Abstract
We present ConsumerBR, a large-scale corpus of consumer complaints and company responses in Brazilian Portuguese, compiled from publicly available data on the Consumidor.gov.br platform. The corpus comprises over 3.1 million consumer–company interactions collected between 2021 and 2025 and combines anonymized textual content with rich structured metadata, including temporal information, complaint outcomes, and consumer satisfaction indicators. We describe a data collection strategy tailored to the platform’s dynamic interface, a preprocessing pipeline that includes response clustering to identify template-based replies, and a hybrid anonymization approach designed to mitigate privacy risks. We also provide a detailed statistical characterization of the corpus, highlighting its scale, coverage, and distributional properties. ConsumerBR is publicly available for research purposes and supports a wide range of applications, including complaint analysis, sentiment modeling, dialogue and response generation, and preference-based evaluation.- Anthology ID:
- 2026.propor-1.66
- Volume:
- Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
- Month:
- April
- Year:
- 2026
- Address:
- Salvador, Brazil
- Editors:
- Marlo Souza, Iria de-Dios-Flores, Diana Santos, Larissa Freitas, Jackson Wilke da Cruz Souza, Eugénio Ribeiro
- Venue:
- PROPOR
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 667–675
- Language:
- URL:
- https://preview.aclanthology.org/ingest-dnd/2026.propor-1.66/
- DOI:
- Cite (ACL):
- Luis A. Duarte, Pedro Giacomin, Vitória Bispo, Mariana O. Silva, Adriano C. M. Pereira, and Gisele L. Pappa. 2026. ConsumerBR: A Large-Scale Corpus of Consumer Complaints in Brazilian Portuguese. In Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1, pages 667–675, Salvador, Brazil. Association for Computational Linguistics.
- Cite (Informal):
- ConsumerBR: A Large-Scale Corpus of Consumer Complaints in Brazilian Portuguese (Duarte et al., PROPOR 2026)
- PDF:
- https://preview.aclanthology.org/ingest-dnd/2026.propor-1.66.pdf