Young Min Cho
2026
Supplement Generation Training for Enhancing Agentic Task Performance
Young Min Cho | Daniele Bonadiman | Divya Bhargavi | Tamer Alkhouli | Salvatore Romeo | Dongwei Jiang | Khushbu Pahwa | Yubin Ge | Etsuko Ishii | Monica Sunkara | Yi Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Young Min Cho | Daniele Bonadiman | Divya Bhargavi | Tamer Alkhouli | Salvatore Romeo | Dongwei Jiang | Khushbu Pahwa | Yubin Ge | Etsuko Ishii | Monica Sunkara | Yi Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Training large foundation models for agentic tasks is increasingly impractical due to the high computational costs, long iteration cycles, and rapid obsolescence as new models are continuously released. Instead of post-training massive models for every new task or domain, we propose Supplement Generation Training (SGT), a more efficient and sustainable strategy. SGT trains a smaller LLM to generate useful supplemental text that, when appended to the original input, helps the larger LLM solve the task more effectively. These lightweight models can dynamically adapt supplements to task requirements, improving performance without modifying the underlying large models. This approach decouples task-specific optimization from large foundation models and enables more flexible, cost-effective deployment of LLM-powered agents in real-world applications.
2025
Cross-Cultural Differences in Mental Health Expressions on Social Media
Sunny Rai | Khushi Shelat | Devansh Jain | Ashwin Kishen | Young Min Cho | Maitreyi Redkar | Samindara Hardikar-Sawant | Lyle Ungar | Sharath Chandra Guntuku
Proceedings of the 3rd Workshop on Cross-Cultural Considerations in NLP (C3NLP 2025)
Sunny Rai | Khushi Shelat | Devansh Jain | Ashwin Kishen | Young Min Cho | Maitreyi Redkar | Samindara Hardikar-Sawant | Lyle Ungar | Sharath Chandra Guntuku
Proceedings of the 3rd Workshop on Cross-Cultural Considerations in NLP (C3NLP 2025)
Culture moderates the way individuals perceive and express mental distress. Current understandings of mental health expressions on social media, however, are predominantly derived from WEIRD (Western, Educated, Industrialized, Rich, and Democratic) contexts. To address this gap, we examine mental health posts on Reddit made by individuals geolocated in India, to identify variations in social media language specific to the Indian context compared to users from Western nations. Our experiments reveal significant psychosocial variations in emotions and temporal orientation. This study demonstrates the potential of social media platforms for identifying cross-cultural differences in mental health expressions (e.g. seeking advice in India vs seeking support by Western users). Significant linguistic variations in online mental health-related language emphasize the importance of developing precision-targeted interventions that are culturally appropriate.
Culturally-Aware Conversations: A Framework & Benchmark for LLMs
Shreya Havaldar | Young Min Cho | Sunny Rai | Lyle Ungar
Proceedings of the Fourth Workshop on Bridging Human-Computer Interaction and Natural Language Processing (HCI+NLP)
Shreya Havaldar | Young Min Cho | Sunny Rai | Lyle Ungar
Proceedings of the Fourth Workshop on Bridging Human-Computer Interaction and Natural Language Processing (HCI+NLP)
Existing benchmarks that measure cultural adaptation in LLMs are misaligned with the actual challenges these models face when interacting with users from diverse cultural backgrounds. In this work, we introduce the first framework and benchmark designed to evaluate LLMs in realistic, multicultural conversational settings. Grounded in sociocultural theory, our framework formalizes how linguistic style — a key element of cultural communication — is shaped by situational, relational, and cultural context. We construct a benchmark dataset based on this framework, annotated by culturally diverse raters, and propose a new set of desiderata for cross-cultural evaluation in NLP: conversational framing, stylistic sensitivity, and subjective correctness. We evaluate today’s top LLMs on our benchmark and show that these models struggle with cultural adaptation in a conversational setting.
Language-based Valence and Arousal Expressions between the United States and China: a Cross-Cultural Examination
Young Min Cho | Dandan Pang | Stuti Thapa | Garrick Sherman | Lyle Ungar | Louis Tay | Sharath Chandra Guntuku
Findings of the Association for Computational Linguistics: NAACL 2025
Young Min Cho | Dandan Pang | Stuti Thapa | Garrick Sherman | Lyle Ungar | Louis Tay | Sharath Chandra Guntuku
Findings of the Association for Computational Linguistics: NAACL 2025
While affective expressions on social media have been extensively studied, most research has focused on the Western context. This paper explores cultural differences in affective expressions by comparing valence and arousal on Twitter/X (geolocated to the US) and Sina Weibo (in Mainland China). Using the NRC-VAD lexicon to measure valence and arousal, we identify distinct patterns of emotional expression across both platforms. Our analysis reveals a functional representation between valence and arousal, showing a negative offset in contrast to traditional lab-based findings which suggest a positive offset. Furthermore, we uncover significant cross-cultural differences in arousal, with US users displaying higher emotional intensity than Chinese users, regardless of the valence of the content. Finally, we conduct a comprehensive language analysis correlating n-grams and LDA topics with affective dimensions to deepen our understanding of how language and culture shape emotional expression. These findings contribute to a more nuanced understanding of affective communication across cultural and linguistic contexts on social media.
2024
Using Daily Language to Understand Drinking: Multi-Level Longitudinal Differential Language Analysis
Matthew Matero | Huy Vu | August Nilsson | Syeda Mahwish | Young Min Cho | James McKay | Johannes Eichstaedt | Richard Rosenthal | Lyle Ungar | H. Andrew Schwartz
Proceedings of the 9th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2024)
Matthew Matero | Huy Vu | August Nilsson | Syeda Mahwish | Young Min Cho | James McKay | Johannes Eichstaedt | Richard Rosenthal | Lyle Ungar | H. Andrew Schwartz
Proceedings of the 9th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2024)
Analyses for linking language with psychological factors or behaviors predominately treat linguistic features as a static set, working with a single document per person or aggregating across multiple posts (e.g. on social media) into a single set of features. This limits language to mostly shed light on between-person differences rather than changes in behavior within-person. Here, we collected a novel dataset of daily surveys where participants were asked to describe their experienced well-being and report the number of alcoholic beverages they had within the past 24 hours. Through this data, we first build a multi-level forecasting model that is able to capture within-person change and leverage both the psychological features of the person and daily well-being responses. Then, we propose a longitudinal version of differential language analysis that finds patterns associated with drinking more (e.g. social events) and less (e.g. task-oriented), as well as distinguishing patterns of heavy drinks versus light drinkers.
2023
An Integrative Survey on Mental Health Conversational Agents to Bridge Computer Science and Medical Perspectives
Young Min Cho | Sunny Rai | Lyle Ungar | João Sedoc | Sharath Guntuku
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Young Min Cho | Sunny Rai | Lyle Ungar | João Sedoc | Sharath Guntuku
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Mental health conversational agents (a.k.a. chatbots) are widely studied for their potential to offer accessible support to those experiencing mental health challenges. Previous surveys on the topic primarily consider papers published in either computer science or medicine, leading to a divide in understanding and hindering the sharing of beneficial knowledge between both domains. To bridge this gap, we conduct a comprehensive literature review using the PRISMA framework, reviewing 534 papers published in both computer science and medicine. Our systematic review reveals 136 key papers on building mental health-related conversational agents with diverse characteristics of modeling and experimental design techniques. We find that computer science papers focus on LLM techniques and evaluating response quality using automated metrics with little attention to the application while medical papers use rule-based conversational agents and outcome metrics to measure the health outcomes of participants. Based on our findings on transparency, ethics, and cultural heterogeneity in this review, we provide a few recommendations to help bridge the disciplinary divide and enable the cross-disciplinary development of mental health conversational agents.
2022
Unsupervised Entity Linking with Guided Summarization and Multiple-Choice Selection
Young Min Cho | Li Zhang | Chris Callison-Burch
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Young Min Cho | Li Zhang | Chris Callison-Burch
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Entity linking, the task of linking potentially ambiguous mentions in texts to corresponding knowledge-base entities, is an important component for language understanding. We address two challenge in entity linking: how to leverage wider contexts surrounding a mention, and how to deal with limited training data. We propose a fully unsupervised model called SumMC that first generates a guided summary of the contexts conditioning on the mention, and then casts the task to a multiple-choice problem where the model chooses an entity from a list of candidates. In addition to evaluating our model on existing datasets that focus on named entities, we create a new dataset that links noun phrases from WikiHow to Wikidata. We show that our SumMC model achieves state-of-the-art unsupervised performance on our new dataset and on exiting datasets.
Search
Fix author
Co-authors
- Lyle Ungar 5
- Sunny Rai 3
- Sharath Chandra Guntuku 2
- Tamer Alkhouli 1
- Divya Bhargavi 1
- Daniele Bonadiman 1
- Chris Callison-Burch 1
- Johannes Eichstaedt 1
- Yubin Ge 1
- Sharath Guntuku 1
- Samindara Hardikar-Sawant 1
- Shreya Havaldar 1
- Etsuko Ishii 1
- Devansh Jain 1
- Dongwei Jiang 1
- Ashwin Kishen 1
- Syeda Mahwish 1
- Matthew Matero 1
- James McKay 1
- August Håkan Nilsson 1
- Khushbu Pahwa 1
- Dandan Pang 1
- Maitreyi Redkar 1
- Salvatore Romeo 1
- Richard Rosenthal 1
- H. Andrew Schwartz 1
- João Sedoc 1
- Khushi Shelat 1
- Garrick Sherman 1
- Monica Sunkara 1
- Louis Tay 1
- Stuti Thapa 1
- Huy Vu 1
- Yi Zhang 1
- Li Zhang 1