TestAug: A Framework for Augmenting Capability-based NLP Tests
Guanqun Yang, Mirazul Haque, Qiaochu Song, Wei Yang, Xueqing Liu
Abstract
The recently proposed capability-based NLP testing allows model developers to test the functional capabilities of NLP models, revealing functional failures for models with good held-out evaluation scores. However, existing work on capability-based testing requires the developer to compose each individual test template from scratch. Such approach thus requires extensive manual efforts and is less scalable. In this paper, we investigate a different approach that requires the developer to only annotate a few test templates, while leveraging the GPT-3 engine to generate the majority of test cases. While our approach saves the manual efforts by design, it guarantees the correctness of the generated suites with a validity checker. Moreover, our experimental results show that the test suites generated by GPT-3 are more diverse than the manually created ones; they can also be used to detect more errors compared to manually created counterparts. Our test suites can be downloaded at https://anonymous-researcher-nlp.github.io/testaug/.- Anthology ID:
- 2022.coling-1.307
- Volume:
- Proceedings of the 29th International Conference on Computational Linguistics
- Month:
- October
- Year:
- 2022
- Address:
- Gyeongju, Republic of Korea
- Editors:
- Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 3480–3495
- Language:
- URL:
- https://aclanthology.org/2022.coling-1.307
- DOI:
- Cite (ACL):
- Guanqun Yang, Mirazul Haque, Qiaochu Song, Wei Yang, and Xueqing Liu. 2022. TestAug: A Framework for Augmenting Capability-based NLP Tests. In Proceedings of the 29th International Conference on Computational Linguistics, pages 3480–3495, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Cite (Informal):
- TestAug: A Framework for Augmenting Capability-based NLP Tests (Yang et al., COLING 2022)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2022.coling-1.307.pdf
- Code
- guanqun-yang/testaug
- Data
- HELP, SST