Mehak Gupta
2026
Facet-Informed Prompting for LLM-Based Personality Assessment: Error-Guided Exemplar Selection and Hierarchical Prediction
Rasiq Hussain | Juhi Shah | Joshua Oltmanns | Mehak Gupta
Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2026)
Rasiq Hussain | Juhi Shah | Joshua Oltmanns | Mehak Gupta
Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2026)
Large language models (LLMs) are increasingly applied to automatic personality assessment, yet most prior work relies on coarse binary labels and direct domain-level predictions, limiting interpretability and ignoring the hierarchical facet structure of personality. In this study, we implement a structured prompting approach with three complementary objectives: direct domain-level prediction, fine-grained facet-level prediction, and domain-level prediction informed by facet outputs. All predictions use a five-level ordinal label scheme, capturing a continuum from very low to very high trait expression. Across all prompt types, we adopt an error-guided self-refinement procedure using in-context learning (ICL) to guide the model toward more accurate predictions. Zero-shot prompts assess baseline performance, while one-shot prompts incorporate a single demonstration example selected through the refinement procedure. Our framework evaluates both domain- and facet-level predictions, enabling examination of how prediction granularity and targeted exemplar selection influence LLM inference. By combining hierarchical domain-facet relationships with structured prompting and refinement, this work aims to provide a systematic approach for interpretable and principled LLM-based personality assessment from long-form life narratives.
Multilingual Language Models Encode Script Over Linguistic Structure
Aastha A K Verma | Anwoy Chatterjee | Mehak Gupta | Tanmoy Chakraborty
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Aastha A K Verma | Anwoy Chatterjee | Mehak Gupta | Tanmoy Chakraborty
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Multilingual language models (LMs) organize representations for typologically and orthographically diverse languages into a shared parameter space, yet the nature of this internal organization remains elusive. In this work, we investigate which linguistic properties — abstract language identity or surface-form cues — shape multilingual representations. To do so, we analyze language-associated units across different model families and scales using the Language Activation Probability Entropy (LAPE) metric, and further decompose activations with Sparse Autoencoders. We find that these units are strongly conditioned on orthography: romanization induces near-disjoint representations that align with neither native-script inputs nor English, while word-order shuffling has limited effect on unit identity. Probing shows that typological structure becomes increasingly accessible in deeper layers, while causal interventions indicate that generation is most sensitive to units that are invariant to surface-form perturbations rather than to units identified by typological alignment alone. Overall, our results suggest that multilingual LMs organize representations around surface form, with linguistic abstraction emerging gradually without collapsing into a unified interlingua.
2025
AI Assistant for Socioeconomic Empowerment Using Federated Learning
Nahed Abdelgaber | Labiba Jahan | Nino Castellano | Joshua Oltmanns | Mehak Gupta | Jia Zhang | Akshay Pednekar | Ashish Basavaraju | Ian Velazquez | Zerui Ma
Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities
Nahed Abdelgaber | Labiba Jahan | Nino Castellano | Joshua Oltmanns | Mehak Gupta | Jia Zhang | Akshay Pednekar | Ashish Basavaraju | Ian Velazquez | Zerui Ma
Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities
Socioeconomic status (SES) reflects an individual’s standing in society, from a holistic set of factors including income, education level, and occupation. Identifying individuals in low-SES groups is crucial to ensuring they receive necessary support. However, many individuals may be hesitant to disclose their SES directly. This study introduces a federated learning-powered framework capable of verifying individuals’ SES levels through the analysis of their communications described in natural language. We propose to study language usage patterns among individuals from different SES groups using clustering and topic modeling techniques. An empirical study leveraging life narrative interviews demonstrates the effectiveness of our proposed approach.