Understanding and Explicitly Measuring Linguistic and Stylistic Properties of Deception via Generation and Translation

Emily Saldanha, Aparna Garimella, Svitlana Volkova


Abstract
Massive digital disinformation is one of the main risks of modern society. Hundreds of models and linguistic analyses have been done to compare and contrast misleading and credible content online. However, most models do not remove the confounding factor of a topic or narrative when training, so the resulting models learn a clear topical separation for misleading versus credible content. We study the feasibility of using two strategies to disentangle the topic bias from the models to understand and explicitly measure linguistic and stylistic properties of content from misleading versus credible content. First, we develop conditional generative models to create news content that is characteristic of different credibility levels. We perform multi-dimensional evaluation of model performance on mimicking both the style and linguistic differences that distinguish news of different credibility using machine translation metrics and classification models. We show that even though generative models are able to imitate both the style and language of the original content, additional conditioning on both the news category and the topic leads to reduced performance. In a second approach, we perform deception style “transfer” by translating deceptive content into the style of credible content and vice versa. Extending earlier studies, we demonstrate that, when conditioned on a topic, deceptive content is shorter, less readable, more biased, and more subjective than credible content, and transferring the style from deceptive to credible content is more challenging than the opposite direction.
Anthology ID:
2020.inlg-1.27
Volume:
Proceedings of the 13th International Conference on Natural Language Generation
Month:
December
Year:
2020
Address:
Dublin, Ireland
Editors:
Brian Davis, Yvette Graham, John Kelleher, Yaji Sripada
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
216–226
Language:
URL:
https://aclanthology.org/2020.inlg-1.27
DOI:
10.18653/v1/2020.inlg-1.27
Bibkey:
Cite (ACL):
Emily Saldanha, Aparna Garimella, and Svitlana Volkova. 2020. Understanding and Explicitly Measuring Linguistic and Stylistic Properties of Deception via Generation and Translation. In Proceedings of the 13th International Conference on Natural Language Generation, pages 216–226, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Understanding and Explicitly Measuring Linguistic and Stylistic Properties of Deception via Generation and Translation (Saldanha et al., INLG 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2020.inlg-1.27.pdf