Li Cheng
2026
SAM3-I: Segment Anything with Instructions
Jingjing Li | Yue Feng | Yuchen Guo | Jincai Huang | Wei Ji | Qi Bi | Yongri Piao | Miao Zhang | Xiaoqi Zhao | Qiang Chen | Shihao Zou | Huchuan Lu | Li Cheng
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jingjing Li | Yue Feng | Yuchen Guo | Jincai Huang | Wei Ji | Qi Bi | Yongri Piao | Miao Zhang | Xiaoqi Zhao | Qiang Chen | Shihao Zou | Huchuan Lu | Li Cheng
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Segment Anything Model 3 (SAM3) advances open-vocabulary segmentation through promptable concept segmentation, enabling users to segment all instances associated with a given concept using short noun-phrase (NP) prompts. While effective for concept-level grounding, real-world interactions often involve far richer natural-language instructions that combine attributes, relations, actions, states, or implicit reasoning. Currently, SAM3 relies on external multi-modal agents to convert complex instructions into NPs and conducts iterative mask filtering, leading to coarse representations and limited instance specificity. In this work, we present SAM3-I, an instruction-following extension of the SAM family that unifies concept-level grounding and instruction-level reasoning within a single segmentation framework. Built upon SAM3, SAM3-I introduces an instruction-aware cascaded adaptation mechanism with dedicated alignment losses that progressively aligns expressive instruction semantics with SAM3’s vision-language representations, enabling direct interpretation of natural-language instructions while preserving its strong concept recall ability. To enable instruction-following learning, we introduce HMPL-Instruct, a large-scale instruction-centric dataset that systematically covers hierarchical instruction semantics and diverse target granularities. Experiments demonstrate that SAM3-I achieves appealing performance across referring and reasoning-based segmentation, showing that SAM3 can be effectively extended to follow complex natural-language instructions without sacrificing its original concept-driven strengths. Code and dataset are available at https://github.com/debby-0527/SAM3-I.
2021
Automated Generation of Accurate & Fluent Medical X-ray Reports
Hoang Nguyen | Dong Nie | Taivanbat Badamdorj | Yujie Liu | Yingying Zhu | Jason Truong | Li Cheng
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Hoang Nguyen | Dong Nie | Taivanbat Badamdorj | Yujie Liu | Yingying Zhu | Jason Truong | Li Cheng
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Our paper aims to automate the generation of medical reports from chest X-ray image inputs, a critical yet time-consuming task for radiologists. Existing medical report generation efforts emphasize producing human-readable reports, yet the generated text may not be well aligned to the clinical facts. Our generated medical reports, on the other hand, are fluent and, more importantly, clinically accurate. This is achieved by our fully differentiable and end-to-end paradigm that contains three complementary modules: taking the chest X-ray images and clinical history document of patients as inputs, our classification module produces an internal checklist of disease-related topics, referred to as enriched disease embedding; the embedding representation is then passed to our transformer-based generator, to produce the medical report; meanwhile, our generator also creates a weighted embedding representation, which is fed to our interpreter to ensure consistency with respect to disease-related topics. Empirical evaluations demonstrate very promising results achieved by our approach on commonly-used metrics concerning language fluency and clinical accuracy. Moreover, noticeable performance gains are consistently observed when additional input information is available, such as the clinical document and extra scans from different views.