Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA
Maharshi Gor, Hal Daumé Iii, Tianyi Zhou, Jordan Lee Boyd-Graber
Abstract
Recent advancements of large language models (LLMs)have led to claims of AI surpassing humansin natural language processing NLP tasks such as textual understanding and reasoning.%This work investigates these assertions by introducingCAIMIRA, a novel framework rooted in item response theory IRTthat enables quantitative assessment and comparison of problem-solving abilities inquestion-answering QA agents.%Through analysis of over 300,000 responses from ~ 70 AI systemsand 155 humans across thousands of quiz questions, CAIMIRA uncovers distinctproficiency patterns in knowledge domains and reasoning skills. %Humans outperform AI systems in knowledge-grounded abductive and conceptual reasoning,while state-of-the-art LLMs like GPT-4 Turbo and Llama-3-70B demonstrate superior performance ontargeted information retrieval and fact-based reasoning, particularly when information gapsare well-defined and addressable through pattern matching or data retrieval.%These findings identify key areas for future QA tasks and model development,highlighting the critical need for questions that not only challengehigher-order reasoning and scientific thinking, but also demand nuanced linguisticand cross-contextual application.- Anthology ID:
- 2024.emnlp-main.1201
- Volume:
- Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 21533–21564
- Language:
- URL:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.emnlp-main.1201/
- DOI:
- 10.18653/v1/2024.emnlp-main.1201
- Cite (ACL):
- Maharshi Gor, Hal Daumé Iii, Tianyi Zhou, and Jordan Lee Boyd-Graber. 2024. Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 21533–21564, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA (Gor et al., EMNLP 2024)
- PDF:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.emnlp-main.1201.pdf