Linguistic Properties of Truthful Response

Bruce W. Lee, Benedict Florance Arockiaraj, Helen Jin


Abstract
We investigate the phenomenon of an LLM’s untruthful response using a large set of 220 handcrafted linguistic features. We focus on GPT-3 models and find that the linguistic profiles of responses are similar across model sizes. That is, how varying-sized LLMs respond to given prompts stays similar on the linguistic properties level. We expand upon this finding by training support vector machines that rely only upon the stylistic components of model responses to classify the truthfulness of statements. Though the dataset size limits our current findings, we present promising evidence that truthfulness detection is possible without evaluating the content itself. We release our code and raw data.
Anthology ID:
2023.trustnlp-1.12
Volume:
Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anaelia Ovalle, Kai-Wei Chang, Ninareh Mehrabi, Yada Pruksachatkun, Aram Galystan, Jwala Dhamala, Apurv Verma, Trista Cao, Anoop Kumar, Rahul Gupta
Venue:
TrustNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
135–140
Language:
URL:
https://aclanthology.org/2023.trustnlp-1.12
DOI:
10.18653/v1/2023.trustnlp-1.12
Bibkey:
Cite (ACL):
Bruce W. Lee, Benedict Florance Arockiaraj, and Helen Jin. 2023. Linguistic Properties of Truthful Response. In Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023), pages 135–140, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Linguistic Properties of Truthful Response (Lee et al., TrustNLP 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2023.trustnlp-1.12.pdf
Supplementary material:
 2023.trustnlp-1.12.SupplementaryMaterial.zip