Ziyi Chen
2026
AdaFuse: Adaptive Ensemble Decoding for Large Language Models
Chengming Cui | Tianxin Wei | Ziyi Chen | Ruizhong Qiu | Zhichen Zeng | Zhining Liu | Xuying Ning | Duo Zhou | Jingrui He
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Chengming Cui | Tianxin Wei | Ziyi Chen | Ruizhong Qiu | Zhichen Zeng | Zhining Liu | Xuying Ning | Duo Zhou | Jingrui He
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) exhibit complementary strengths arising from differences in pretraining data, model architectures, and decoding behaviors. Inference-time ensembling provides a practical way to combine these capabilities without retraining. However, existing ensemble approaches suffer from fundamental limitations. Most rely on fixed fusion granularity, which lacks the flexibility required for mid-generation adaptation and fails to adapt to different generation characteristics across tasks. To address these challenges, we propose AdaFuse, an adaptive ensemble decoding framework that dynamically selects semantically appropriate fusion units during generation. Rather than committing to a fixed granularity, AdaFuse adjusts fusion behavior on the fly based on the decoding context, with words serving as basic building blocks for alignment. To be specific, we introduce an uncertainty-based criterion to decide whether to apply ensembling at each decoding step. Under confident decoding states, the model continues generation directly. In less certain states, AdaFuse invokes a diversity-aware scaling strategy to explore alternative candidate continuations and inform ensemble decisions. This design establishes a synergistic interaction between adaptive ensembling and test-time scaling, where ensemble decisions guide targeted exploration, and the resulting diversity in turn strengthens ensemble quality. Experiments on open-domain QA, arithmetic reasoning, and machine translation demonstrate that AdaFuse consistently outperforms strong ensemble baselines, achieving an average relative improvement of 6.88%.
2024
UF-HOBI at “Discharge Me!”: A Hybrid Solution for Discharge Summary Generation Through Prompt-based Tuning of GatorTronGPT Models
Mengxian Lyu | Cheng Peng | Daniel Paredes | Ziyi Chen | Aokun Chen | Jiang Bian | Yonghui Wu
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing
Mengxian Lyu | Cheng Peng | Daniel Paredes | Ziyi Chen | Aokun Chen | Jiang Bian | Yonghui Wu
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing
Automatic generation of discharge summaries presents significant challenges due to the length of clinical documentation, the dispersed nature of patient information, and the diverse terminology used in healthcare. This paper presents a hybrid solution for generating discharge summary sections as part of our participation in the “Discharge Me!” Challenge at the BioNLP 2024 Shared Task. We developed a two-stage generation method using both extractive and abstractive techniques, in which we first apply name entity recognition (NER) to extract key clinical concepts, which are then used as input for a prompt-tuning based GatorTronGPT model to generate coherent text for two important sections including “Brief Hospital Course” and “Discharge Instructions”. Our system was ranked 5th in this challenge, achieving an overall score of 0.284. The results demonstrate the effectiveness of our hybrid solution in improving the quality of automated discharge section generation.
2022
Self-supervised Cross-modal Pretraining for Speech Emotion Recognition and Sentiment Analysis
Iek-Heng Chu | Ziyi Chen | Xinlu Yu | Mei Han | Jing Xiao | Peng Chang
Findings of the Association for Computational Linguistics: EMNLP 2022
Iek-Heng Chu | Ziyi Chen | Xinlu Yu | Mei Han | Jing Xiao | Peng Chang
Findings of the Association for Computational Linguistics: EMNLP 2022
Multimodal speech emotion recognition (SER) and sentiment analysis (SA) are important techniques for human-computer interaction. Most existing multimodal approaches utilize either shallow cross-modal fusion of pretrained features, or deep cross-modal fusion with raw features. Recently, attempts have been made to fuse pretrained feature representations in a deep fusion manner during fine-tuning stage. However those approaches have not led to improved results, partially due to their relatively simple fusion mechanisms and lack of proper cross-modal pretraining. In this work, leveraging single-modal pretrained models (RoBERTa and HuBERT), we propose a novel deeply-fused audio-text bi-modal transformer with carefully designed cross-modal fusion mechanism and a stage-wise cross-modal pretraining scheme to fully facilitate the cross-modal learning. Our experiment results show that the proposed method achieves state-of-the-art results on the public IEMOCAP emotion and CMU-MOSEI sentiment datasets, exceeding the previous benchmarks by a large margin.