Abstract
From document summarization to code generation, chabots have disrupted various aspects of scientific research and writing. While chabots are useful research resources for ideation, information retrieval, and editing, their generative pre-trained transformer (GPT) models’ underlying knowledge infrastructure is opaque. This has raised questions about the reliability of generative chatbot responses, as GPT models are known to respond with misleading information that appears to be accurate. Prior research has investigated the utility of OpenAI’s public chatbot, ChatGPT, to generate reliable bibliographic information with a focus on small-scale medical-related scientific facts. We present an expanded study that analyzes GPT-4’s ability to accurately identify 1,326 scientific facts and link them to academic sources. Using both the API and UI service, we experimented with open-ended and close-ended prompts to establish an understanding of GPT-4’s general ability at this domain-specific task, as well as study the real-world scenario of an average user interacting with ChatGPT using its UI. GPT-4 accurately identified 96% of the scientific facts and generated relevant and existent academic citations with 78% accuracy. Using the claims that GPT-4 mislabeled and provided incorrect sources via the API, we prompt two public GPTs customized for academic writing to evaluate if they correctly label the scientific claims and provide accurate sources. We find that these GPTs are able to accurately label 38% of the mislabeled claims, with 95% of the corresponding citations being accurate and relevant.- Anthology ID:
- 2024.customnlp4u-1.19
- Volume:
- Proceedings of the 1st Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U)
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Sachin Kumar, Vidhisha Balachandran, Chan Young Park, Weijia Shi, Shirley Anugrah Hayati, Yulia Tsvetkov, Noah Smith, Hannaneh Hajishirzi, Dongyeop Kang, David Jurgens
- Venue:
- CustomNLP4U
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 257–268
- Language:
- URL:
- https://aclanthology.org/2024.customnlp4u-1.19
- DOI:
- 10.18653/v1/2024.customnlp4u-1.19
- Cite (ACL):
- Autumn Toney. 2024. What Kind of Sourcery is This? Evaluating GPT-4’s Performance on Linking Scientific Fact to Citations. In Proceedings of the 1st Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U), pages 257–268, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- What Kind of Sourcery is This? Evaluating GPT-4’s Performance on Linking Scientific Fact to Citations (Toney, CustomNLP4U 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.customnlp4u-1.19.pdf