Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?

Nishant Balepur; Abhilasha Ravichander; Rachel Rudinger

doi:10.18653/v1/2024.acl-long.555

Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?

Nishant Balepur, Abhilasha Ravichander, Rachel Rudinger

Abstract

Multiple-choice question answering (MCQA) is often used to evaluate large language models (LLMs). To see if MCQA assesses LLMs as intended, we probe if LLMs can perform MCQA with choices-only prompts, where models must select the correct answer only from the choices. In three MCQA datasets and four LLMs, this prompt bests a majority baseline in 11/12 cases, with up to 0.33 accuracy gain. To help explain this behavior, we conduct an in-depth, black-box analysis on memorization, choice dynamics, and question inference. Our key findings are threefold. First, we find no evidence that the choices-only accuracy stems from memorization alone. Second, priors over individual choices do not fully explain choices-only accuracy, hinting that LLMs use the group dynamics of choices. Third, LLMs have some ability to infer a relevant question from choices, and surprisingly can sometimes even match the original question. We hope to motivate the use of stronger baselines in MCQA benchmarks, the design of robust MCQA datasets, and further efforts to explain LLM decision-making.

Anthology ID:: 2024.acl-long.555
Volume:: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10308–10330
Language:
URL:: https://preview.aclanthology.org/icon-24-ingestion/2024.acl-long.555/
DOI:: 10.18653/v1/2024.acl-long.555
Bibkey:
Cite (ACL):: Nishant Balepur, Abhilasha Ravichander, and Rachel Rudinger. 2024. Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10308–10330, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question? (Balepur et al., ACL 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/icon-24-ingestion/2024.acl-long.555.pdf

PDF Search Fix data