Handling Extreme Class Imbalance in Technical Logbook Datasets
Farhad Akhbardeh, Cecilia Ovesdotter Alm, Marcos Zampieri, Travis Desell
Abstract
Technical logbooks are a challenging and under-explored text type in automated event identification. These texts are typically short and written in non-standard yet technical language, posing challenges to off-the-shelf NLP pipelines. The granularity of issue types described in these datasets additionally leads to class imbalance, making it challenging for models to accurately predict which issue each logbook entry describes. In this paper we focus on the problem of technical issue classification by considering logbook datasets from the automotive, aviation, and facilities maintenance domains. We adapt a feedback strategy from computer vision for handling extreme class imbalance, which resamples the training data based on its error in the prediction process. Our experiments show that with statistical significance this feedback strategy provides the best results for four different neural network models trained across a suite of seven different technical logbook datasets from distinct technical domains. The feedback strategy is also generic and could be applied to any learning problem with substantial class imbalances.- Anthology ID:
- 2021.acl-long.312
- Volume:
- Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
- Month:
- August
- Year:
- 2021
- Address:
- Online
- Editors:
- Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
- Venues:
- ACL | IJCNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4034–4045
- Language:
- URL:
- https://aclanthology.org/2021.acl-long.312
- DOI:
- 10.18653/v1/2021.acl-long.312
- Cite (ACL):
- Farhad Akhbardeh, Cecilia Ovesdotter Alm, Marcos Zampieri, and Travis Desell. 2021. Handling Extreme Class Imbalance in Technical Logbook Datasets. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4034–4045, Online. Association for Computational Linguistics.
- Cite (Informal):
- Handling Extreme Class Imbalance in Technical Logbook Datasets (Akhbardeh et al., ACL-IJCNLP 2021)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/2021.acl-long.312.pdf