Edmond S. L. Ho

Also published as: Edmond S.L. Ho

2026

Towards A Scanpath-Conditioned Surprisal Theory: Modeling Reader Information States
Michael Mooney | Edmond S. L. Ho
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Standard surprisal is typically computed from the linear text prefix, but human reading is non-linear and memory constrained: readers skip words, regress, and do not retain prior context perfectly. We propose a formulation of surprisal conditioned on a reader-specific accessible information state given by the scanpath history and memory dynamics, rather than by the written prefix alone. Prior context is treated as only probabilistically accessible at each fixation, allowing predictability to depend on both non-linear exposure and forgetting. We evaluate the approach on eye-tracking corpora using held-out log-likelihood over standard duration based reading measures. Across model variants, conditioning on accessible information states improves predictive fit over standard surprisal baselines. These results suggest that predictability in human reading is better characterized relative to the reader’s evolving accessible information state than to the written prefix alone.

2025

pdf bib abs

Libra: Leveraging Temporal Images for Biomedical Radiology Analysis
Xi Zhang | Zaiqiao Meng | Jake Lever | Edmond S. L. Ho
Findings of the Association for Computational Linguistics: ACL 2025

Radiology report generation (RRG) requires advanced medical image analysis, effective temporal reasoning, and accurate text generation. While multimodal large language models (MLLMs) align with pre-trained vision encoders to enhance visual-language understanding, most existing methods rely on single-image analysis or rule-based heuristics to process multiple images, failing to fully leverage temporal information in multi-modal medical datasets. In this paper, we introduce **Libra**, a temporal-aware MLLM tailored for chest X-ray report generation. Libra combines a radiology-specific image encoder with a novel Temporal Alignment Connector (**TAC**), designed to accurately capture and integrate temporal differences between paired current and prior images. Extensive experiments on the MIMIC-CXR dataset demonstrate that Libra establishes a new state-of-the-art benchmark among similarly scaled MLLMs, setting new standards in both clinical relevance and lexical accuracy. All source code and data are publicly available at: https://github.com/X-iZhang/Libra.

2024

pdf bib abs

Gla-AI4BioMed at RRG24: Visual Instruction-tuned Adaptation for Radiology Report Generation
Xi Zhang | Zaiqiao Meng | Jake Lever | Edmond S.L. Ho
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing

This paper introduces a radiology-focused visual language model designed to generate radiology reports from chest X-rays. Building on previous findings that large language models can acquire multimodal capabilities when aligned with pretrained vision encoders, we demonstrate similar potential with chest X-ray images. The model combines an image encoder (CLIP) with a fine-tuned large language model (LLM) based on the Vicuna-7B architecture. The training process involves a two-stage approach: initial alignment of chest X-ray features with the LLM, followed by fine-tuning for radiology report generation. The study highlights the importance of generating both FINDINGS and IMPRESSIONS sections in radiology reports and evaluates the model’s performance using various metrics, achieving notable accuracy in generating high-quality medical reports. The research also addresses the need for domain-specific fine-tuning to capture the intricate details necessary for accurate medical interpretations and reports.

Co-authors

Venues

Fix author