Abstract
This study evaluated ChatGPT’s ability to understand causal language in science papers and news by testing its accuracy in a task of labeling the strength of a claim as causal, conditional causal, correlational, or no relationship. The results show that ChatGPT is still behind the existing fine-tuned BERT models by a large margin. ChatGPT also had difficulty understanding conditional causal claims mitigated by hedges. However, its weakness may be utilized to improve the clarity of human annotation guideline. Chain-of-Thoughts were faithful and helpful for improving prompt performance, but finding the optimal prompt is difficult with inconsistent results and the lack of effective method to establish cause-effect between prompts and outcomes, suggesting caution when generalizing prompt engineering results across tasks or models.- Anthology ID:
- 2023.wassa-1.33
- Volume:
- Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Jeremy Barnes, Orphée De Clercq, Roman Klinger
- Venue:
- WASSA
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 379–389
- Language:
- URL:
- https://preview.aclanthology.org/remove-affiliations/2023.wassa-1.33/
- DOI:
- 10.18653/v1/2023.wassa-1.33
- Cite (ACL):
- Yuheun Kim, Lu Guo, Bei Yu, and Yingya Li. 2023. Can ChatGPT Understand Causal Language in Science Claims?. In Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis, pages 379–389, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Can ChatGPT Understand Causal Language in Science Claims? (Kim et al., WASSA 2023)
- PDF:
- https://preview.aclanthology.org/remove-affiliations/2023.wassa-1.33.pdf