Parsa Hejabi
2026
The Subjectivity of Respect in Police Traffic Stops: Modeling Community Perspectives in Body-Worn Camera Footage
Preni Golazizian | Elnaz Rahmati | Jackson Trager | Zhivar Sourati | Nona Ghazizadeh | Georgios Chochlakis | Jose J. Alcocer | Kerby Bennett | Aarya Vijay Devnani | Parsa Hejabi | Harry G. Muttram | Akshay Kiran Padte | Mehrshad Saadatinia | Chenhao Wu | Alireza Salkhordeh Ziabari | Michael Sierra-Ar\'evalo | Nicholas Weller | Shrikanth Narayanan | Benjamin A.t. Graham | Morteza Dehghani
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Preni Golazizian | Elnaz Rahmati | Jackson Trager | Zhivar Sourati | Nona Ghazizadeh | Georgios Chochlakis | Jose J. Alcocer | Kerby Bennett | Aarya Vijay Devnani | Parsa Hejabi | Harry G. Muttram | Akshay Kiran Padte | Mehrshad Saadatinia | Chenhao Wu | Alireza Salkhordeh Ziabari | Michael Sierra-Ar\'evalo | Nicholas Weller | Shrikanth Narayanan | Benjamin A.t. Graham | Morteza Dehghani
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Traffic stops are among the most frequent police–civilian interactions, and body-worn cameras (BWCs) provide a unique record of how these encounters unfold. Respect is a central dimension of these interactions, shaping public trust and perceived legitimacy, yet its interpretation is inherently subjective and shaped by lived experience, rendering community-specific perspectives a critical consideration. Leveraging unprecedented access to Los Angeles Police Department BWC footage, we introduce the first large-scale traffic-stop dataset annotated with respect ratings and free-text rationales from multiple perspectives. By sampling annotators from police-affiliated, justice-system-impacted, and non-affiliated Los Angeles residents, we enable the systematic study of perceptual differences across diverse communities. To this end, (i) we develop a domain-specific evaluation rubric grounded in procedural justice theory, LAPD training materials, and extensive fieldwork; (ii) we introduce a criterion-driven preference data construction framework for perspective-consistent alignment, and (ii) we propose a perspective-aware modeling framework that predicts personalized respect ratings and generates annotator-specific rationales for both officers and civilian drivers from traffic-stop transcripts. Across all three annotator groups, our approach improves both rating prediction performance and rationale alignment. Our perspective-aware framework enables law enforcement to better understand diverse community expectations, providing a vital tool for building public trust and procedural legitimacy.
Flip-Flop Consistency: Unsupervised Training for Robustness to Prompt Perturbations in LLMs
Parsa Hejabi | Elnaz Rahmati | Alireza Salkhordeh Ziabari | Morteza Dehghani
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Parsa Hejabi | Elnaz Rahmati | Alireza Salkhordeh Ziabari | Morteza Dehghani
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) often produce inconsistent answers when faced with different phrasings of the same prompt. In this paper, we propose Flip-Flop Consistency (F2C), an unsupervised training method that improves robustness to such perturbations. F2C is composed of two key components. The first, Consensus Cross-Entropy (CCE), uses a majority vote across prompt variations to create a hard pseudo-label. The second is a representation alignment loss that pulls lower-confidence and non-majority predictors toward the consensus established by high-confidence, majority-voting variations. We evaluate our method on 11 datasets spanning four NLP tasks, with 4–15 prompt variations per dataset. On average, F2C raises observed agreement by 11.62%, improves mean F1 by 8.94%, and reduces performance variance across formats by 3.29%. In out-of-domain evaluations, F2C generalizes effectively, increasing ̅F1 and agreement while decreasing variance across most source-target pairs. Finally, when trained on only a subset of prompt perturbations and evaluated on held-out formats, F2C consistently improves both performance and agreement while reducing variance. These findings highlight F2C as an effective unsupervised method for enhancing LLM consistency, performance, and generalization under prompt perturbations.
2024
Reinforced Multiple Instance Selection for Speaker Attribute Prediction
Alireza Salkhordeh Ziabari | Ali Omrani | Parsa Hejabi | Preni Golazizian | Brendan Kennedy | Payam Piray | Morteza Dehghani
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Alireza Salkhordeh Ziabari | Ali Omrani | Parsa Hejabi | Preni Golazizian | Brendan Kennedy | Payam Piray | Morteza Dehghani
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Language usage is related to speaker age, gender, moral concerns, political ideology, and other attributes. Current state-of-the-art methods for predicting these attributes take a speaker’s utterances as input and provide a prediction per speaker attribute. Most of these approaches struggle to handle a large number of utterances per speaker. This difficulty is primarily due to the computational constraints of the models. Additionally, only a subset of speaker utterances may be relevant to specific attributes. In this paper, we formulate speaker attribute prediction as a Multiple Instance Learning (MIL) problem and propose RL-MIL, a novel approach based on Reinforcement Learning (RL) that effectively addresses both of these challenges. Our experiments demonstrate that our RL-based methodology consistently outperforms previous approaches across a range of related tasks: predicting speakers’ psychographics and demographics from social media posts, and political ideologies from transcribed speeches. We create synthetic datasets and investigate the behavior of RL-MIL systematically. Our results show the success of RL-MIL in improving speaker attribute prediction by learning to select relevant speaker utterances.
Search
Fix author
Co-authors
- Morteza Dehghani 3
- Alireza Salkhordeh Ziabari 3
- Preni Golazizian 2
- Elnaz Rahmati 2
- Jose J. Alcocer 1
- Kerby Bennett 1
- Georgios Chochlakis 1
- Aarya Vijay Devnani 1
- Nona Ghazizadeh 1
- Benjamin A.t. Graham 1
- Brendan Kennedy 1
- Harry G. Muttram 1
- Shrikanth Narayanan 1
- Ali Omrani 1
- Akshay Kiran Padte 1
- Payam Piray 1
- Mehrshad Saadatinia 1
- Michael Sierra-Ar\'evalo 1
- Zhivar Sourati 1
- Jackson Trager 1
- Nicholas Weller 1
- Chenhao Wu 1