Zhuoyan Xu


2025

pdf bib
Neural at ArchEHR-QA 2025: Agentic Prompt Optimization for Evidence-Grounded Clinical Question Answering
Sai Prasanna Teja Reddy Bogireddy | Abrar Majeedi | Viswanath Gajjala | Zhuoyan Xu | Siddhant Rai | Vaishnav Potlapalli
Proceedings of the 24th Workshop on Biomedical Language Processing (Shared Tasks)

Automated question answering (QA) over electronic health records (EHRs) can bridge critical information gaps for clinicians and patients, yet it demands both precise evidence retrieval and faithful answer generation under limited supervision. In this work, we present Neural, the runner-up in the BioNLP 2025 ArchEHR-QA shared task on evidence grounded clinical QA. Our proposed method decouples the task into (1) sentence-level evidence identification and (2) answer synthesis with explicit citations. For each stage, we automatically explore the prompt space with DSPy’s MIPROv2 optimizer, jointly tuning instructions and few-shot demonstrations on the development set. A self-consistency voting scheme further improves evidence recall without sacrificing precision. On the hidden test set, our method attains an overall score of 51.5, placing second stage while outperforming standard zero-shot and few-shot prompting by over 20 and 10 points, respectively. These results indicate that data-driven prompt optimization is a cost-effective alternative to model fine-tuning for high-stakes clinical QA, advancing the reliability of AI assistants in healthcare.

pdf bib
Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers
Yingyu Liang | Heshan Liu | Zhenmei Shi | Zhao Song | Zhuoyan Xu | Jiale Zhao | Zhen Zhuang
Findings of the Association for Computational Linguistics: EMNLP 2025

The self-attention mechanism is key to the success of transformers in recent large language models (LLMs). However, the quadratic computational cost, O(n2), with respect to the input sequence length n poses a significant obstacle to further improvement and scalability in longer contexts.In this work, we leverage the convolution-like structure of attention matrices to develop an efficient approximation method for attention computation using convolution matrices. We propose a \mathsf{conv} basis system, analogous to the rank basis, and show that any lower triangular matrix can be decomposed as a sum of structured convolution matrices in this basis. We then design a fast algorithm to approximate the attention matrix using a sum of k convolution matrices. This enables us to compute attention during inference via Fast Fourier Transforms (FFT) in O(knd log n) time, where d is the hidden dimension, achieving nearly linear time complexity, n1+o(1), in practical scenarios where kd = no(1). Furthermore, both training forward and backward gradient computations can be performed in n1+o(1) time as well.We provide theoretical guarantees on runtime and approximation error and conduct preliminary experiments to evaluate the effectiveness of our approach. We hope this new paradigm for accelerating attention computation in transformer models facilitates their application to longer contexts.