Jerry Huang
2026
Tackling Distractor Documents in Multi-Hop QA with Reinforcement and Curriculum Learning
Jerry Huang | Siddarth Madala | Risham Sidhu | Cheng Niu | Hao Peng | Julia Hockenmaier | Tong Zhang
Findings of the Association for Computational Linguistics: EACL 2026
Jerry Huang | Siddarth Madala | Risham Sidhu | Cheng Niu | Hao Peng | Julia Hockenmaier | Tong Zhang
Findings of the Association for Computational Linguistics: EACL 2026
Retrieval-augmented generation (RAG) systems rely on retrieval models for identifying relevant contexts and answer generation models for utilizing those contexts. However, retrievers exhibit imperfect recall and precision, limiting downstream performance. We introduce RAG-RL, an answer generation model trained for multi-hop question answering (MHQA) to not only generate answers but also to identify and cite relevant information from larger sets of retrieved contexts, shifting some of the burden of identifying relevant documents from the retriever to the answer generator. Our approach uses curriculum learning, where models are trained across retrieval settings with varying levels of noise. Our experiments show that training samples with fewer distractor documents enable models to acquire citation and reasoning skills with greater sample efficiency and generalizability, demonstrating strong model performance even as the number of irrelevant passages increases. We benchmark our methods on three open-domain MHQA datasets and report significant gains in answer and citation accuracy. Furthermore, our experiments provide empirical insights into how simpler training samples can give models stronger signals for learning specific skills (e.g., citation generation) and how different components of post-training (e.g., training set construction, rule-based rewards, training sample ordering, etc.) impact final model performance.
MultiFinBen: Benchmarking Large Language Models for Multilingual and Multimodal Financial Application
Xueqing Peng | Lingfei Qian | Yan Wang | Ruoyu Xiang | Yueru He | Yang Ren | Mingyang Jiang | Vincent Jim Zhang | Yuqing Guo | Jeff Zhao | Huan He | Yi Han | Yun Feng | Yuechen Jiang | Yupeng Cao | Haohang Li | Yangyang Yu | Xiaoyu Wang | Penglei Gao | Shengyuan Lin | Keyi Wang | Shanshan Yang | Yilun Zhao | Zhiwei Liu | Peng Lu | Jerry Huang | Suyuchen Wang | Triantafillos Papadopoulos | Polydoros Giannouris | Efstathia Soufleri | Nuo Chen | Zhiyang Deng | Heming Fu | Yijia Zhao | Mingquan Lin | Meikang Qiu | Kaleb E Smith | Arman Cohan | Xiao-Yang Liu | Jimin Huang | Guojun Xiong | Alejandro Lopez-Lira | Xi Chen | Junichi Tsujii | Jian-Yun Nie | Sophia Ananiadou | Qianqian Xie
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xueqing Peng | Lingfei Qian | Yan Wang | Ruoyu Xiang | Yueru He | Yang Ren | Mingyang Jiang | Vincent Jim Zhang | Yuqing Guo | Jeff Zhao | Huan He | Yi Han | Yun Feng | Yuechen Jiang | Yupeng Cao | Haohang Li | Yangyang Yu | Xiaoyu Wang | Penglei Gao | Shengyuan Lin | Keyi Wang | Shanshan Yang | Yilun Zhao | Zhiwei Liu | Peng Lu | Jerry Huang | Suyuchen Wang | Triantafillos Papadopoulos | Polydoros Giannouris | Efstathia Soufleri | Nuo Chen | Zhiyang Deng | Heming Fu | Yijia Zhao | Mingquan Lin | Meikang Qiu | Kaleb E Smith | Arman Cohan | Xiao-Yang Liu | Jimin Huang | Guojun Xiong | Alejandro Lopez-Lira | Xi Chen | Junichi Tsujii | Jian-Yun Nie | Sophia Ananiadou | Qianqian Xie
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Real-world financial analysis involves information across multiple languages and modalities, from reports and news to scanned filings and meeting recordings. Yet most existing evaluations of LLMs in finance remain text-only, monolingual, and largely saturated by current models. To bridge these gaps, we present MultiFinBen, the first expert-annotated multilingual (five languages) and multimodal (text, vision, audio) benchmark for evaluating LLMs in realistic financial contexts. MultiFinBen introduces two new task families: multilingual financial reasoning, which tests cross-lingual evidence integration from filings and news, and financial OCR, which extracts structured text from scanned documents containing tables and charts. Rather than aggregating all available datasets, we apply a structured, difficulty-aware selection based on advanced model performance, ensuring balanced challenge and removing redundant tasks. Evaluating 21 leading LLMs shows that even frontier multimodal models like GPT-4o achieve only 46.01% overall, stronger on vision and audio but dropping sharply in multilingual settings. These findings expose persistent limitations in multilingual, multimodal, and expert-level financial reasoning. All datasets, evaluation scripts, and leaderboards are publicly released.
Contextual Relevance and Adaptive Sampling for LLM-Based Document Reranking
Jerry Huang | Siddarth Madala | Cheng Niu | Julia Hockenmaier | Tong Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jerry Huang | Siddarth Madala | Cheng Niu | Julia Hockenmaier | Tong Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Reranking algorithms have made progress in improving document retrieval quality by efficiently aggregating relevance judgments generated by large language models (LLMs). However, identifying relevant documents for queries that require in-depth reasoning remains a major challenge. Reasoning-intensive queries often exhibit multifaceted information needs and nuanced interpretations, rendering document relevance inherently context dependent and often noisy. To address this, we propose contextual relevance, which we define as the probability that a document is relevant to a given query, marginalized over the distribution of different reranking contexts it may appear in (i.e., the set of candidate documents it is ranked alongside and the order in which the documents are presented to a reranking model). While prior works have studied methods to mitigate the positional bias LLMs exhibit by accounting for the ordering of documents, we empirically show that batch composition also materially affects relevance judgments. To efficiently estimate contextual relevance, we propose TS-SetRank, a sampling-based, uncertainty-aware reranking algorithm. Empirically, TS-SetRank improves nDCG@10 over retrieval and reranking baselines by 15–25% on BRIGHT and 6–21% on BEIR, highlighting the importance of modeling relevance as context-dependent.
EvoEdit: Evolving Null-space Alignment for Robust and Efficient Knowledge Editing
Sicheng Lyu | Yu Gu | Xinyu Wang | Jerry Huang | Sitao Luan | Yufei Cui | Xiao-Wen Chang | Peng Lu
Findings of the Association for Computational Linguistics: ACL 2026
Sicheng Lyu | Yu Gu | Xinyu Wang | Jerry Huang | Sitao Luan | Yufei Cui | Xiao-Wen Chang | Peng Lu
Findings of the Association for Computational Linguistics: ACL 2026
Large language models (LLMs) require continual updates to rectify outdated or erroneous knowledge. Model editing has emerged as a compelling paradigm for introducing targeted modifications without the computational burden of full retraining. Existing approaches are mainly based on a locate-then-edit framework. However, in sequential editing contexts, where multiple updates are applied over time, they exhibit significant limitations and suffer from catastrophic interference, i.e., new edits compromise previously integrated updates and degrade preserved knowledge. To address these challenges, we introduce EvoEdit, a novel editing strategy that mitigates catastrophic interference through sequential null-space alignment, enabling stable and efficient model editing. By performing sequential null-space alignment for each incoming edit, EvoEdit preserves both original and previously modified knowledge representations and maintains output invariance on preserved knowledge even across long edit sequences, effectively mitigating interference. Evaluations on real-world sequential knowledge-editing benchmarks show that EvoEdit achieves better or comparable performance than prior state-of-the-art locate-then-edit techniques, with up to 3.53× speedup. Overall, these results underscore the necessity of developing more principled approaches for designing LLMs in dynamically evolving information settings, while providing a simple yet effective solution with strong theoretical guarantees.
GUIDE: Towards Scalable Advising for Research Ideas
Yaowenqi Liu | BingXu Meng | Rui Pan | Yuxing Liu | Jerry Huang | Jiaxuan You | Tong Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yaowenqi Liu | BingXu Meng | Rui Pan | Yuxing Liu | Jerry Huang | Jiaxuan You | Tong Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The field of AI research is advancing at an unprecedented pace, enabling automated hypothesis generation and experimental design across diverse domains such as biology, mathematics, and artificial intelligence. Despite these advancements, there remains a significant gap in the availability of scalable advising systems capable of providing high-quality, well-reasoned feedback to refine proposed hypotheses and experimental designs. To address this challenge, we explore key factors that underlie the development of robust advising systems, including model size, data reweighting, context length, confidence estimation, and structured reasoning processes. Our findings reveal that a relatively small model, when equipped with a well-compressed literature database and a structured reasoning framework, can outperform powerful general-purpose language models such as Deepseek-R1 in terms of acceptance rates for self-ranked top-30% submissions to ICLR 2025. Moreover, when limited to high-confidence predictions, our system achieves an acceptance rate exceeding 90% on the ICLR 2025 test set, underscoring its potential to significantly enhance the quality and efficiency of hypothesis generation and experimental design.
Investigating the Multilingual Calibration Effects of Language Model Instruction Tuning
Jerry Huang | Peng Lu | Qiuhao Zeng | Yusuke Iwasawa | Yutaka Matsuo | Sarath Chandar | Edison Marrese-Taylor | Irene Li
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
Jerry Huang | Peng Lu | Qiuhao Zeng | Yusuke Iwasawa | Yutaka Matsuo | Sarath Chandar | Edison Marrese-Taylor | Irene Li
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
Ensuring that deep learning models are well-calibrated in terms of their predictive uncertainty is essential in maintaining their trustworthiness and reliability, yet despite increasing advances in foundation model research, the relationship between such large language models (LLMs) and their calibration remains an open area of research. In this work, we look at a critical gap in the calibration of LLMs within multilingual settings, in an attempt to better understand how the data scarcity can potentially lead to different calibration effects and how commonly used techniques can apply in these settings. Our analysis on two multilingual benchmarks, over 29 and 42 languages respectively, reveals that even in low-resource languages, model confidence can increase significantly after instruction-tuning on high-resource language SFT datasets. However, improvements in accuracy are marginal or non-existent, resulting in mis-calibration, highlighting a critical shortcoming of standard SFT for multilingual languages. Furthermore, we observe that the use of label smoothing to be a reasonable method alleviate this concern, again without any need for low-resource SFT data, maintaining better calibration across all languages. Overall, this highlights the importance of multilingual considerations for both training and tuning LLMs in order to improve their reliability and fairness in downstream use.
2025
How Well Can a Long Sequence Model Model Long Sequences? Comparing Architectural Inductive Biases on Long-Context Abilities
Jerry Huang
Proceedings of the 31st International Conference on Computational Linguistics
Jerry Huang
Proceedings of the 31st International Conference on Computational Linguistics
Long sequences occur in abundance within real-world scenarios, hence properly modelling them opens numerous down-stream use-cases. Deep neural networks, however, have often struggled with these for a variety of reasons. Recent advances, both in system engineering as well as model design, have enabled the scaling up of model that are purported to support extended context length. In particular, the state-space and linear recurrent neural network families of models hypothetically can entend to infinite sequence length. However, is this too good to be true? We conduct an evaluation to show that while such claims may be sound theoretically, there remain large practical gaps that are empirically observed. In particular, recurrent models still suffer in the same settings as long-context LLMs with attention. We further show that different inductive biases have inconsistent extrapolation capabilities, highlighting the need to further study such paradigms and investigate why long-context models seemingly fail to behave as one might expect.
SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models
Margaret Mitchell | Giuseppe Attanasio | Ioana Baldini | Miruna Clinciu | Jordan Clive | Pieter Delobelle | Manan Dey | Sil Hamilton | Timm Dill | Jad Doughman | Ritam Dutt | Avijit Ghosh | Jessica Zosa Forde | Carolin Holtermann | Lucie-Aimée Kaffee | Tanmay Laud | Anne Lauscher | Roberto L Lopez-Davila | Maraim Masoud | Nikita Nangia | Anaelia Ovalle | Giada Pistilli | Dragomir Radev | Beatrice Savoldi | Vipul Raheja | Jeremy Qin | Esther Ploeger | Arjun Subramonian | Kaustubh Dhole | Kaiser Sun | Amirbek Djanibekov | Jonibek Mansurov | Kayo Yin | Emilio Villa Cueva | Sagnik Mukherjee | Jerry Huang | Xudong Shen | Jay Gala | Hamdan Al-Ali | Tair Djanibekov | Nurdaulet Mukhituly | Shangrui Nie | Shanya Sharma | Karolina Stanczak | Eliza Szczechla | Tiago Timponi Torrent | Deepak Tunuguntla | Marcelo Viridiano | Oskar Van Der Wal | Adina Yakefu | Aurélie Névéol | Mike Zhang | Sydney Zink | Zeerak Talat
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Margaret Mitchell | Giuseppe Attanasio | Ioana Baldini | Miruna Clinciu | Jordan Clive | Pieter Delobelle | Manan Dey | Sil Hamilton | Timm Dill | Jad Doughman | Ritam Dutt | Avijit Ghosh | Jessica Zosa Forde | Carolin Holtermann | Lucie-Aimée Kaffee | Tanmay Laud | Anne Lauscher | Roberto L Lopez-Davila | Maraim Masoud | Nikita Nangia | Anaelia Ovalle | Giada Pistilli | Dragomir Radev | Beatrice Savoldi | Vipul Raheja | Jeremy Qin | Esther Ploeger | Arjun Subramonian | Kaustubh Dhole | Kaiser Sun | Amirbek Djanibekov | Jonibek Mansurov | Kayo Yin | Emilio Villa Cueva | Sagnik Mukherjee | Jerry Huang | Xudong Shen | Jay Gala | Hamdan Al-Ali | Tair Djanibekov | Nurdaulet Mukhituly | Shangrui Nie | Shanya Sharma | Karolina Stanczak | Eliza Szczechla | Tiago Timponi Torrent | Deepak Tunuguntla | Marcelo Viridiano | Oskar Van Der Wal | Adina Yakefu | Aurélie Névéol | Mike Zhang | Sydney Zink | Zeerak Talat
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Large Language Models (LLMs) reproduce and exacerbate the social biases present in their training data, and resources to quantify this issue are limited. While research has attempted to identify and mitigate such biases, most efforts have been concentrated around English, lagging the rapid advancement of LLMs in multilingual settings. In this paper, we introduce a new multilingual parallel dataset SHADES to help address this issue, designed for examining culturally-specific stereotypes that may be learned by LLMs. The dataset includes stereotypes from 20 regions around the world and 16 languages, spanning multiple identity categories subject to discrimination worldwide. We demonstrate its utility in a series of exploratory evaluations for both “base” and “instruction-tuned” language models. Our results suggest that stereotypes are consistently reflected across models and languages, with some languages and models indicating much stronger stereotype biases than others.
Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination
Jerry Huang | Prasanna Parthasarathi | Mehdi Rezagholizadeh | Boxing Chen | Sarath Chandar
Findings of the Association for Computational Linguistics: ACL 2025
Jerry Huang | Prasanna Parthasarathi | Mehdi Rezagholizadeh | Boxing Chen | Sarath Chandar
Findings of the Association for Computational Linguistics: ACL 2025
The growth in prominence of large language models (LLMs) in everyday life can be largely attributed to their generative abilities, yet some of this is also owed to the risks and costs associated with their use. On one front is their tendency to hallucinate false or misleading information, limiting their reliability. On another is the increasing focus on the computational limitations associated with traditional self-attention based LLMs, which has brought about new alternatives, in particular recurrent models, meant to overcome them. Yet it remains uncommon to consider these two concerns simultaneously. Do changes in architecture exacerbate/alleviate existing concerns about hallucinations? Do they affect how and where they occur? Through an extensive evaluation, we study how these architecture-based inductive biases affect the propensity to hallucinate. While hallucination remains a general phenomenon not limited to specific architectures, the situations in which they occur and the ease with which specific types of hallucinations can be induced can significantly differ based on the model architecture. These findings highlight the need for better understanding both these problems in conjunction with each other, as well as consider how to design more universal techniques for handling hallucinations.
2024
Do Large Language Models Know How Much They Know?
Gabriele Prato | Jerry Huang | Prasanna Parthasarathi | Shagun Sodhani | Sarath Chandar
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Gabriele Prato | Jerry Huang | Prasanna Parthasarathi | Shagun Sodhani | Sarath Chandar
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Large Language Models (LLMs) have emerged as highly capable systems and are increasingly being integrated into various uses. Nevertheless, the rapid advancement in their deployment trails a comprehensive understanding of their internal mechanisms, as well as a delineation of their capabilities and limitations. A desired characteristic of an intelligent system is its ability to recognize the scope of its own knowledge. To investigate whether LLMs embody this attribute, we develop a benchmark that challenges these models to enumerate all information they possess on specific topics. This benchmark assesses whether the models recall excessive, insufficient, or the precise amount of required information, thereby indicating their awareness of how much they know about the given topic. Our findings reveal that the emergence of this property varies across different architectures and manifests at diverse rates. However, with sufficient scaling, all tested models are ultimately capable of performing this task. The insights gained from this research advance our understanding of LLMs, shedding light on their operational capabilities and contributing to the ongoing exploration of their intricate dynamics.
Context-Aware Assistant Selection for Improved Inference Acceleration with Large Language Models
Jerry Huang | Prasanna Parthasarathi | Mehdi Rezagholizadeh | Sarath Chandar
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Jerry Huang | Prasanna Parthasarathi | Mehdi Rezagholizadeh | Sarath Chandar
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Despite their widespread adoption, large language models (LLMs) remain prohibitive to use under resource constraints, with their ever growing sizes only increasing the barrier for use. One particular issue stems from the high latency associated with auto-regressive generation in LLMs, rendering the largest LLMs difficult to use without advanced computing infrastructure. Assisted decoding, where a smaller draft model guides a larger expert model’s generation, has helped alleviate this concern, but remains dependent on alignment between the two models. Thus if the draft model is insufficiently capable on some domain of interest relative to the target model, performance can degrade. Alternatively, one can leverage multiple draft models to better cover the expertise of the target, but when multiple black-box draft models are available, selecting an assistant without details about its construction can be difficult. To better understand this decision making problem, we observe it as a contextual bandit, where a policy must choose a draft model based on a context. We show that even without prior knowledge of the draft models, creating an offline dataset from only outputs of independent draft/target models and training a policy over the alignment of these outputs can accelerate performance on multiple domains as long as an individual draft model is effective. We observe these results hold on various settings with multiple assisted decoding candidates, highlighting its flexibility and the advantageous role that such decision making can play.
2023
EpiK-Eval: Evaluation for Language Models as Epistemic Models
Gabriele Prato | Jerry Huang | Prasanna Parthasarathi | Shagun Sodhani | Sarath Chandar
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Gabriele Prato | Jerry Huang | Prasanna Parthasarathi | Shagun Sodhani | Sarath Chandar
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
In the age of artificial intelligence, the role of large language models (LLMs) is becoming increasingly central. Despite their growing prevalence, their capacity to consolidate knowledge from different training documents—a crucial ability in numerous applications—remains unexplored. This paper presents the first study examining the capability of LLMs to effectively combine such information within their parameter space. We introduce EpiK-Eval, a novel question-answering benchmark tailored to evaluate LLMs’ proficiency in formulating a coherent and consistent knowledge representation from segmented narratives. Evaluations across various LLMs reveal significant weaknesses in this domain. We contend that these shortcomings stem from the intrinsic nature of prevailing training objectives. Consequently, we advocate for refining the approach towards knowledge consolidation, as it harbors the potential to dramatically improve their overall effectiveness and performance. The findings from this study offer insights for developing more robust and reliable LLMs. Our code and benchmark are available at https://github.com/chandar-lab/EpiK-Eval
Search
Fix author
Co-authors
- Sarath Chandar 5
- Prasanna Parthasarathi 4
- Peng Lu 3
- Tong Zhang 3
- Julia Hockenmaier 2
- Siddarth Madala 2
- Cheng Niu 2
- Gabriele Prato 2
- Mehdi Rezagholizadeh 2
- Shagun Sodhani 2
- Hamdan Al-Ali 1
- Sophia Ananiadou 1
- Giuseppe Attanasio 1
- Ioana Baldini 1
- Yupeng Cao 1
- Xiao-Wen Chang 1
- Nuo Chen 1
- Xi Chen 1
- Boxing Chen 1
- Miruna Clinciu 1
- Jordan Clive 1
- Arman Cohan 1
- Yufei Cui 1
- Pieter Delobelle 1
- Zhiyang Deng 1
- Manan Dey 1
- Kaustubh Dhole 1
- Timm Dill 1
- Amirbek Djanibekov 1
- Jad Doughman 1
- Ritam Dutt 1
- Yun Feng 1
- Jessica Zosa Forde 1
- Heming Fu 1
- Jay Gala 1
- Penglei Gao 1
- Avijit Ghosh 1
- Polydoros Giannouris 1
- Yu Gu (谷峪) 1
- Yuqing Guo 1
- Sil Hamilton 1
- Yi Han 1
- Yueru He 1
- Huan He 1
- Carolin Holtermann 1
- Jimin Huang 1
- Yusuke Iwasawa 1
- Mingyang Jiang 1
- Yuechen Jiang 1
- Lucie-Aimée Kaffee 1
- Tanmay Laud 1
- Anne Lauscher 1
- Haohang Li 1
- Irene Li 1
- Shengyuan Lin 1
- Mingquan Lin 1
- Zhiwei Liu 1
- Xiao-Yang Liu 1
- Yaowenqi Liu 1
- Yuxing Liu 1
- Roberto L Lopez-Davila 1
- Alejandro Lopez-Lira 1
- Sitao Luan 1
- Sicheng Lyu 1
- Jonibek Mansurov 1
- Edison Marrese-Taylor 1
- Maraim Masoud 1
- Yutaka Matsuo 1
- BingXu Meng 1
- Margaret Mitchell 1
- Sagnik Mukherjee 1
- Nurdaulet Mukhituly 1
- Nikita Nangia 1
- Aurelie Neveol 1
- Jian-Yun Nie 1
- Shangrui Nie 1
- Anaelia Ovalle 1
- Rui Pan 1
- Triantafillos Papadopoulos 1
- Hao Peng 1
- Xueqing Peng 1
- Giada Pistilli 1
- Esther Ploeger 1
- Lingfei Qian 1
- Jeremy Qin 1
- Meikang Qiu 1
- Dragomir Radev 1
- Vipul Raheja 1
- Yang Ren 1
- Beatrice Savoldi 1
- Shanya Sharma 1
- Xudong Shen 1
- Risham Sidhu 1
- Kaleb E. Smith 1
- Efstathia Soufleri 1
- Karolina Stanczak 1
- Arjun Subramonian 1
- Kaiser Sun 1
- Eliza Szczechla 1
- Tair Djanibekov 1
- Zeerak Talat 1
- Tiago Timponi Torrent 1
- Jun’ichi Tsujii 1
- Deepak Tunuguntla 1
- Oskar Van Der Wal 1
- Emilio Villa-Cueva 1
- Marcelo Viridiano 1
- Yan Wang 1
- Xiaoyu Wang 1
- Keyi Wang 1
- Suyuchen Wang 1
- Xinyu Wang 1
- Ruoyu Xiang 1
- Qianqian Xie 1
- Guojun Xiong 1
- Adina Yakefu 1
- Shanshan Yang 1
- Kayo Yin 1
- Jiaxuan You 1
- Yangyang Yu 1
- Qiuhao Zeng 1
- Vincent Jim Zhang 1
- Mike Zhang 1
- Jeff Zhao 1
- Yilun Zhao 1
- Yijia Zhao 1
- Sydney Zink 1