Yuchen Guo
2026
SAM3-I: Segment Anything with Instructions
Jingjing Li | Yue Feng | Yuchen Guo | Jincai Huang | Wei Ji | Qi Bi | Yongri Piao | Miao Zhang | Xiaoqi Zhao | Qiang Chen | Shihao Zou | Huchuan Lu | Li Cheng
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jingjing Li | Yue Feng | Yuchen Guo | Jincai Huang | Wei Ji | Qi Bi | Yongri Piao | Miao Zhang | Xiaoqi Zhao | Qiang Chen | Shihao Zou | Huchuan Lu | Li Cheng
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Segment Anything Model 3 (SAM3) advances open-vocabulary segmentation through promptable concept segmentation, enabling users to segment all instances associated with a given concept using short noun-phrase (NP) prompts. While effective for concept-level grounding, real-world interactions often involve far richer natural-language instructions that combine attributes, relations, actions, states, or implicit reasoning. Currently, SAM3 relies on external multi-modal agents to convert complex instructions into NPs and conducts iterative mask filtering, leading to coarse representations and limited instance specificity. In this work, we present SAM3-I, an instruction-following extension of the SAM family that unifies concept-level grounding and instruction-level reasoning within a single segmentation framework. Built upon SAM3, SAM3-I introduces an instruction-aware cascaded adaptation mechanism with dedicated alignment losses that progressively aligns expressive instruction semantics with SAM3’s vision-language representations, enabling direct interpretation of natural-language instructions while preserving its strong concept recall ability. To enable instruction-following learning, we introduce HMPL-Instruct, a large-scale instruction-centric dataset that systematically covers hierarchical instruction semantics and diverse target granularities. Experiments demonstrate that SAM3-I achieves appealing performance across referring and reasoning-based segmentation, showing that SAM3 can be effectively extended to follow complex natural-language instructions without sacrificing its original concept-driven strengths. Code and dataset are available at https://github.com/debby-0527/SAM3-I.
2020
Graph-based Aspect Representation Learning for Entity Resolution
Zhenqi Zhao | Yuchen Guo | Dingxian Wang | Yufan Huang | Xiangnan He | Bin Gu
Proceedings of the Graph-based Methods for Natural Language Processing (TextGraphs)
Zhenqi Zhao | Yuchen Guo | Dingxian Wang | Yufan Huang | Xiangnan He | Bin Gu
Proceedings of the Graph-based Methods for Natural Language Processing (TextGraphs)
Entity Resolution (ER) identifies records that refer to the same real-world entity. Deep learning approaches improved the generalization ability of entity matching models, but hardly overcame the impact of noisy or incomplete data sources. In real scenes, an entity usually consists of multiple semantic facets, called aspects. In this paper, we focus on entity augmentation, namely retrieving the values of missing aspects. The relationship between aspects is naturally suitable to be represented by a knowledge graph, where entity augmentation can be modeled as a link prediction problem. Our paper proposes a novel graph-based approach to solve entity augmentation. Specifically, we apply a dedicated random walk algorithm, which uses node types to limit the traversal length, and encodes graph structure into low-dimensional embeddings. Thus, the missing aspects could be retrieved by a link prediction model. Furthermore, the augmented aspects with fixed orders are served as the input of a deep Siamese BiLSTM network for entity matching. We compared our method with state-of-the-art methods through extensive experiments on downstream ER tasks. According to the experiment results, our model outperforms other methods on evaluation metrics (accuracy, precision, recall, and f1-score) to a large extent, which demonstrates the effectiveness of our method.