Atharvan Dogra
2025
Language Models can Subtly Deceive Without Lying: A Case Study on Strategic Phrasing in Legislation
Atharvan Dogra
|
Krishna Pillutla
|
Ameet Deshpande
|
Ananya B. Sai
|
John J Nay
|
Tanmay Rajpurohit
|
Ashwin Kalyan
|
Balaraman Ravindran
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
We explore the ability of large language models (LLMs) to engage in subtle deception through strategically phrasing and intentionally manipulating information. This harmful behavior can be hard to detect, unlike blatant lying or unintentional hallucination. We build a simple testbed mimicking a legislative environment where a corporate lobbyist module is proposing amendments to bills that benefit a specific company while evading identification of this benefactor. We use real-world legislative bills matched with potentially affected companies to ground these interactions. Our results show that LLM lobbyists can draft subtle phrasing to avoid such identification by strong LLM-based detectors. Further optimization of the phrasing using LLM-based re-planning and re-sampling increases deception rates by up to 40 percentage points.Our human evaluations to verify the quality of deceptive generations and their retention of self-serving intent show significant coherence with our automated metrics and also help in identifying certain strategies of deceptive phrasing.This study highlights the risk of LLMs’ capabilities for strategic phrasing through seemingly neutral language to attain self-serving goals. This calls for future research to uncover and protect against such subtle deception.
2022
Raccoons at SemEval-2022 Task 11: Leveraging Concatenated Word Embeddings for Named Entity Recognition
Atharvan Dogra
|
Prabsimran Kaur
|
Guneet Kohli
|
Jatin Bedi
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
Named Entity Recognition (NER), an essential subtask in NLP that identifies text belonging to predefined semantics such as a person, location, organization, drug, time, clinical procedure, biological protein, etc. NER plays a vital role in various fields such as informationextraction, question answering, and machine translation. This paper describes our participating system run to the Named entity recognitionand classification shared task SemEval-2022. The task is motivated towards detecting semantically ambiguous and complex entities in shortand low-context settings. Our team focused on improving entity recognition by improving the word embeddings. We concatenated the word representations from State-of-the-art language models and passed them to find the best representation through a reinforcement trainer. Our results highlight the improvements achieved by various embedding concatenations.
Search
Fix author
Co-authors
- Jatin Bedi 1
- Ameet Deshpande 1
- Ashwin Kalyan 1
- Prabsimran Kaur 1
- Guneet Kohli 1
- show all...