Jiahe Liu
2026
Identifying Collective Intelligence Factor in LLM Agent Groups for Generalizable Multi-Agent System Design
Zhilun Zhou | Zihan Liu | Jiahe Liu | Yihan Wang | Qingyu Shao | Fengli Xu | Depeng Jin | Yong Li
Findings of the Association for Computational Linguistics: ACL 2026
Zhilun Zhou | Zihan Liu | Jiahe Liu | Yihan Wang | Qingyu Shao | Fengli Xu | Depeng Jin | Yong Li
Findings of the Association for Computational Linguistics: ACL 2026
Large language model (LLM)-based multi-agent systems (MASs) have shown impressive performance in solving a wide range of complex problems. However, previous studies mainly focus on designing customized MAS for specific tasks, while a critical research problem remains unclear: Do LLM agent groups exhibit a form of “general intelligence” that reflects their general ability across various tasks? Researchers have found a Collective Intelligence (CI) factor in human groups that captures their general capability. Inspired by this, in this study, we aim to investigate whether an analogous CI factor also exists in LLM agent groups, which is crucial for building generalizable MAS. Motivated by human cognitive psychology experiments, we construct 108 LLM agent groups with diverse group sizes, LLM compositions, and communication topologies. We systematically evaluate these groups across a wide range of tasks and analyze their performances. Our results demonstrate that an Artificial Collective Intelligence (ACI) factor can be extracted from LLM agent groups to predict the generalization performance on new tasks. Inspired by this, we train a model to predict the ACI based on the features of MAS, and show that it can be used as a plug-in to enhance the generalization ability of MAS optimization methods.
PsychEthicsBench: Evaluating Large Language Models Against Australian Mental Health Ethics
Yaling Shen | Stephanie Fong | Yiwen Jiang | Zimu Wang | Feilong Tang | Qingyang Xu | Xiangyu Zhao | Zhongxing Xu | Jiahe Liu | Jinpeng Hu | Dominic Dwyer | Zongyuan Ge
Findings of the Association for Computational Linguistics: ACL 2026
Yaling Shen | Stephanie Fong | Yiwen Jiang | Zimu Wang | Feilong Tang | Qingyang Xu | Xiangyu Zhao | Zhongxing Xu | Jiahe Liu | Jinpeng Hu | Dominic Dwyer | Zongyuan Ge
Findings of the Association for Computational Linguistics: ACL 2026
The increasing integration of large language models (LLMs) into mental health applications necessitates robust frameworks for evaluating professional safety alignment. Current evaluative approaches primarily rely on refusal-based safety signals, which offer limited insight into the nuanced behaviors required in clinical practice. In mental health, clinically inadequate refusals can be perceived as unempathetic and discourage help-seeking. To address this gap, we move beyond refusal-centric metrics and introduce PsychEthicsBench, the first principle-grounded benchmark based on Australian psychology and psychiatry guidelines, designed to evaluate LLMs’ ethical knowledge and behavioral responses through multiple-choice and open-ended tasks with fine-grained ethicality annotations. Empirical results across 14 models reveal that refusal rates are poor indicators of ethical behavior, revealing a significant divergence between safety triggers and clinical appropriateness. Notably, we find that domain-specific fine-tuning can degrade ethical robustness, as several specialized models underperform their base backbones in ethical alignment. PsychEthicsBench provides a foundation for systematic, jurisdiction-aware evaluation of LLMs in mental health, encouraging more responsible development in this domain.
CHiRPE: A Step Towards Real-World Clinical NLP with Clinician-Oriented Model Explanations
Stephanie Fong | Zimu Wang | Guilherme C Oliveira | Xiangyu Zhao | Yiwen Jiang | Jiahe Liu | Beau-Luke Colton | Scott W. Woods | Martha Shenton | Barnaby Nelson | Zongyuan Ge | Dominic Dwyer
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
Stephanie Fong | Zimu Wang | Guilherme C Oliveira | Xiangyu Zhao | Yiwen Jiang | Jiahe Liu | Beau-Luke Colton | Scott W. Woods | Martha Shenton | Barnaby Nelson | Zongyuan Ge | Dominic Dwyer
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
The medical adoption of NLP tools requires interpretability by end users, yet traditional explainable AI (XAI) methods are misaligned with clinical reasoning and lack clinician input. We introduce CHiRPE (Clinical High-Risk Prediction with Explainability), an NLP pipeline that takes transcribed semi-structured clinical interviews to: (i) predict psychosis risk; and (ii) generate novel SHAP explanation formats co-developed with clinicians. Trained on 944 semi-structured interview transcripts across 24 international clinics of the AMP-SCZ study, the CHiRPE pipeline integrates symptom-domain mapping, LLM summarisation, and BERT classification. CHiRPE achieved over 90% accuracy across three BERT variants and outperformed baseline models. Explanation formats were evaluated by 28 clinical experts who indicated a strong preference for our novel concept-guided explanations, especially hybrid graph-and-text summary formats. CHiRPE demonstrates that clinically-guided model development produces both accurate and interpretable results. Our next step is focused on real-world testing across our 24 international sites.
2024
Optimizing Multimodal Large Language Models for Detection of Alcohol Advertisements via Adaptive Prompting
Daniel Cabrera Lozoya | Jiahe Liu | Simon D’Alfonso | Mike Conway
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing
Daniel Cabrera Lozoya | Jiahe Liu | Simon D’Alfonso | Mike Conway
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing
Adolescents exposed to advertisements promoting addictive substances exhibit a higher likelihood of subsequent substance use. The predominant source for youth exposure to such advertisements is through online content accessed via smartphones. Detecting these advertisements is crucial for establishing and maintaining a safer online environment for young people. In our study, we utilized Multimodal Large Language Models (MLLMs) to identify addictive substance advertisements in digital media. The performance of MLLMs depends on the quality of the prompt used to instruct the model. To optimize our prompts, an adaptive prompt engineering approach was implemented, leveraging a genetic algorithm to refine and enhance the prompts. To evaluate the model’s performance, we augmented the RICO dataset, consisting of Android user interface screenshots, by superimposing alcohol ads onto them. Our results indicate that the MLLM can detect advertisements promoting alcohol with a 0.94 accuracy and a 0.94 F1 score.
Breaking the Silence: How Online Forums Address Lung Cancer Stigma and Offer Support
Jiahe Liu | Mike Conway | Daniel Cabrera Lozoya
Proceedings of the 22nd Annual Workshop of the Australasian Language Technology Association
Jiahe Liu | Mike Conway | Daniel Cabrera Lozoya
Proceedings of the 22nd Annual Workshop of the Australasian Language Technology Association
Lung cancer remains a leading cause of cancer-related deaths, but public support for individuals living with lung cancer is often constrained by stigma and misconceptions, leading to serious emotional and social consequences for those diagnosed. Understanding how this stigma manifests and affects individuals is vital for developing inclusive interventions. Online discussion forums offer a unique opportunity to examine how lung cancer stigma is expressed and experienced. This study combines qualitative analysis and unsupervised learning (topic modelling) to explore stigma-related content within an online lung cancer forum. Our findings highlight the role of online forums as a key space for addressing anti-discriminatory attitudes and sharing experiences of lung cancer stigma. We found that users both with and with- out lung cancer engage in discussions pertaining to supportive and welcoming topics, high- lighting the online forum’s role in facilitating social and informational support.
Search
Fix author
Co-authors
- Mike Conway 2
- Dominic Dwyer 2
- Stephanie Fong 2
- Zongyuan Ge 2
- Yiwen Jiang 2
- Daniel Cabrera Lozoya 2
- Zimu Wang 2
- Xiangyu Zhao 2
- Beau-Luke Colton 1
- Simon D’Alfonso 1
- Jinpeng Hu 1
- Depeng Jin 1
- Yong Li 1
- Zihan Liu 1
- Barnaby Nelson 1
- Guilherme C Oliveira 1
- Qingyu Shao 1
- Yaling Shen 1
- Martha Shenton 1
- Feilong Tang 1
- Yihan Wang 1
- Scott W. Woods 1
- Fengli Xu 1
- Qingyang Xu 1
- Zhongxing Xu 1
- Zhilun Zhou 1