@inproceedings{roemmele-gordon-2024-test,
    title = "From Test-Taking to Test-Making: Examining {LLM} Authoring of Commonsense Assessment Items",
    author = "Roemmele, Melissa  and
      Gordon, Andrew",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2024.findings-emnlp.299/",
    doi = "10.18653/v1/2024.findings-emnlp.299",
    pages = "5193--5203",
    abstract = "LLMs can now perform a variety of complex writing tasks. They also excel in answering questions pertaining to natural language inference and commonsense reasoning. Composing these questions is itself a skilled writing task, so in this paper we consider LLMs as authors of commonsense assessment items. We prompt LLMs to generate items in the style of a prominent benchmark for commonsense reasoning, the Choice of Plausible Alternatives (COPA). We examine the outcome according to analyses facilitated by the LLMs and human annotation. We find that LLMs that succeed in answering the original COPA benchmark are also more successful in authoring their own items."
}Markdown (Informal)
[From Test-Taking to Test-Making: Examining LLM Authoring of Commonsense Assessment Items](https://preview.aclanthology.org/ingest-emnlp/2024.findings-emnlp.299/) (Roemmele & Gordon, Findings 2024)
ACL