Michael Lu


2025

pdf bib
Rosetta-PL: Propositional Logic as a Benchmark for Large Language Model Reasoning
Shaun Lee Baek | Shaun Esua-Mensah | Cyrus Tsui | Sejan Vigneswaralingam | Abdullah Alali | Michael Lu | Vasu Sharma | Kevin Zhu
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)

Large Language Models (LLMs) are primarily trained on high-resource natural languages, limiting their effectiveness in low-resource settings and in tasks requiring deep logical reasoning. This research introduces Rosetta-PL, a benchmark designed to evaluate LLMs’ logical reasoning and generalization capabilities in a controlled environment. We construct Rosetta-PL by translating a dataset of logical propositions from Lean into a custom logical language, which is then used to fine-tune an LLM (e.g., GPT-4o). Our experiments analyze the impact of the size of the dataset and the translation methodology on the performance of the model. Our results indicate that preserving logical relationships in the translation process significantly boosts precision, with accuracy plateauing beyond roughly 20,000 training samples. These insights provide valuable guidelines for optimizing LLM training in formal reasoning tasks and improving performance in various low-resource language applications.

pdf bib
DecepBench: Benchmarking Multimodal Deception Detection
Ethan Braverman | Vittesh Maganti | Nysa Lalye | Akhil Ganti | Michael Lu | Kevin Zhu | Vasu Sharma | Sean O’Brien
Proceedings of the Third Workshop on Social Influence in Conversations (SICon 2025)

Deception detection is crucial in domains such as security, forensics, and legal proceedings, as well as to ensure the reliability of AI systems. However, current approaches are limited by the lack of generalizable and interpretable benchmarks built on large and diverse datasets. To address this gap, we introduce DecepBench, a comprehensive and robust benchmark for multimodal deception detection. DecepBench includes an enhanced version of the DOLOS dataset, the largest game-show deception dataset (1,700 labeled video clips with audio). We augment each video clip with transcripts, introducing a third modality (text) and incorporating deception-related features identified in psychological research. We employ explainable methods to evaluate the relevance of key deception cues, providing insights into model limitations and guiding future improvements. Our enhancements to DOLOS, combined with these interpretable analyses, yield improved performance and a deeper understanding of multimodal deception detection.