Ruilin Yang
2026
E-ViC: Reasoning Beyond Text via Embodied Visual Chain for Spatial Intelligence
Junbo Qi | Yi Zhang | Hanchu Ni | Che Liu | Zhimin Yao | Ruilin Yang | Xiancong Ren | Liangjian Wen | Wei Ge | Yuya Ieiri | Osamu Yoshie | Yong Dai | Xiaozhu Ju
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Junbo Qi | Yi Zhang | Hanchu Ni | Che Liu | Zhimin Yao | Ruilin Yang | Xiancong Ren | Liangjian Wen | Wei Ge | Yuya Ieiri | Osamu Yoshie | Yong Dai | Xiaozhu Ju
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Precise spatial reasoning is fundamental to embodied intelligence, yet current Vision-Language Models (VLMs) remain bottlenecked by text-based Chain-of-Thought (CoT) that relies solely on textual reasoning trajectories, often bypassing active engagement with fine-grained visual details. To address this, we present E-ViC (Embodied Visual Chain), a framework that moves reasoning beyond text and directly into the visual domain. By formulating visual operations (e.g., zooming, marking) as executable primitives, E-ViC transforms perception from static prediction into an active verification process. Distinct from approaches relying on supervised step-wise trajectories, E-ViC is trained via an agentic reinforcement learning paradigm. This enables the model to autonomously discover optimal policies, leading to the emergence of human-like “look-and-confirm” strategies driven solely by task-level rewards. To facilitate this, we curate a comprehensive 24.4K-sample dataset covering diverse embodied tasks. By grounding reasoning in pixel-level interactions, E-ViC reframes spatial intelligence as a verifiable, tool-using capability. Extensive evaluations on external benchmarks demonstrate that our approach consistently outperforms strong VLM baselines with an average gain of 10.1%.
The Price of Thought: A Multilingual Analysis of Reasoning, Performance, and Cost of Negotiation in Large Language Models
Sherzod Hakimov | Roland Bernard | Tim Leiber | Karl Osswald | Kristina Richert | Ruilin Yang | Raffaella Bernardi | David Schlangen
Findings of the Association for Computational Linguistics: EACL 2026
Sherzod Hakimov | Roland Bernard | Tim Leiber | Karl Osswald | Kristina Richert | Ruilin Yang | Raffaella Bernardi | David Schlangen
Findings of the Association for Computational Linguistics: EACL 2026
Negotiation is a fundamental challenge for AI agents, as it requires an ability to reason strategically, model opponents, and balance cooperation with competition. We present the first comprehensive study that systematically evaluates how explicit reasoning training affects the negotiation abilities of both commercial and open-weight large language models, comparing these models to their vanilla counterparts across three languages. Using a self-play setup across three diverse dialogue games, we analyse trade-offs between performance and cost, the language consistency of reasoning processes, and the nature of strategic adaptation exhibited by models.Our findings show that enabling reasoning—that is, scaling test time compute—significantly improves negotiation outcomes by enhancing collaboration and helping models overcome task complexities, but comes at a substantial computational cost: reasoning improves GPT-5’s performance by 31.4 % while increasing its cost by nearly 400 %. Most critically, we uncover a significant multilingual reasoning distinction: open-weight models consistently switch to English for their internal reasoning steps, even when negotiating in German or Italian (and thus possibly impacting potential explainability gains through the disclosure of reasoning traces), while a leading commercial model maintains language consistency between reasoning and final output.