Michael Ng
2026
AlphaEdit+: Model Editing in the Presence of Conflicting and Inconsistent Knowledge
Qing Liu | Jianhao Zhang | Ou Wu | Michael Ng | Yi Du
Findings of the Association for Computational Linguistics: ACL 2026
Qing Liu | Jianhao Zhang | Ou Wu | Michael Ng | Yi Du
Findings of the Association for Computational Linguistics: ACL 2026
Knowledge editing is a crucial technique for daily updates in LLMs, requiring a balance between accurately modifying incorrect knowledge and preserving existing information. The recently proposed AlphaEdit method achieves competitive editing performance by updating parameters under null-space constraints. However, our theoretical analysis reveals that AlphaEdit struggles with high knowledge conflicts and inconsistencies during editing. To address this, we propose a new editing method AlphaEdit+, featuring three key improvements: 1) relaxing null-space constraints by adding a matrix perturbation through optimization to resolve conflicts between new and preserved knowledge; 2) introducing a weighting scheme on previously updated knowledge constraints to mitigate conflicts between new and historical editing; 3) developing a value smoothing algorithm to resolve high knowledge inconsistencies. These enhancements collectively ensure robust editing while maintaining model coherence. Comprehensive experiments show that our approach AlphaEdit+ not only resolves the brittleness of the original method on carefully constructed challenging datasets but also outperforms AlphaEdit on existing benchmark datasets.
TVWorld: Foundations for Remote-Control TV Agents
Zhantao Ma | Quanfeng Lu | Shuai Zhong | Dahai Yu | Ping Luo | Michael Ng
Findings of the Association for Computational Linguistics: ACL 2026
Zhantao Ma | Quanfeng Lu | Shuai Zhong | Dahai Yu | Ping Luo | Michael Ng
Findings of the Association for Computational Linguistics: ACL 2026
Recent large vision–language models (LVLMs) have demonstrated strong potential for device control. However, existing research has primarily focused on point-and-click (PnC) interaction, while remote-control (RC) interaction commonly encountered in everyday TV usage remains largely underexplored. To fill this gap, we introduce TVWorld, an offline graph-based abstraction of real-world TV navigation that enables reproducible and deployment-free evaluation. On this basis, we derive two complementary benchmarks that comprehensively assess TV-use capabilities: TVWorld-N for topology-aware navigation and TVWorld-G for focus-aware grounding. These benchmarks expose a key limitation of existing agents: insufficient topology awareness for focus-based, long-horizon TV navigation. Motivated by this finding, we propose a Topology-Aware Training framework that injects topology awareness into LVLMs. Using this framework, we develop TVTheseus, a foundation model specialized for TV navigation. TVTheseus achieves a success rate of 68.3 on TVWorld-N, surpassing strong closed-source baselines such as Gemini 3 Flash and establishing state-of-the-art (SOTA) performance. Additional analyses further provide valuable insights into the development of effective TV-use agents.
2025
DAPE V2: Process Attention Score as Feature Map for Length Extrapolation
Chuanyang Zheng | Yihang Gao | Han Shi | Jing Xiong | Jiankai Sun | Jingyao Li | Minbin Huang | Xiaozhe Ren | Michael Ng | Xin Jiang | Zhenguo Li | Yu Li
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Chuanyang Zheng | Yihang Gao | Han Shi | Jing Xiong | Jiankai Sun | Jingyao Li | Minbin Huang | Xiaozhe Ren | Michael Ng | Xin Jiang | Zhenguo Li | Yu Li
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The attention mechanism is a fundamental component of the Transformer model, contributing to interactions among distinct tokens. In general, the attention scores are determined simply by the key-query products. However, this work’s occasional trial (combining DAPE and NoPE) of including additional MLPs on attention scores without position encoding indicates that the classical key-query multiplication may limit the performance of Transformers. In this work, we conceptualize attention as a feature map and apply the convolution operator (for neighboring attention scores across different heads) to mimic the processing methods in computer vision. Specifically, **the main contribution of this paper is identifying and interpreting the Transformer length extrapolation problem as a result of the limited expressiveness of the naive query and key dot product, and we successfully translate the length extrapolation issue into a well-understood feature map processing problem**, which is called Convolutional Data-Adaptive Position Encoding (CDAPE).The novel insight, which can be adapted to various attention-related models, reveals that the current Transformer architecture has the potential for further evolution. Extensive experiments demonstrate that treating attention as a feature map and applying convolution as a processing method significantly enhances Transformer performance.