Yilong Dai
2026
GoViG: Goal-Conditioned Visual Navigation Instruction Generation via Multimodal Reasoning
Fengyi Wu | Yifei Dong | Yilong Dai | Guangyu Chen | Qifeng Wu | Huiting Huang | Hang Wang | Qi Dai | Alexander G Hauptmann | Zhi-Qi Cheng
Findings of the Association for Computational Linguistics: ACL 2026
Fengyi Wu | Yifei Dong | Yilong Dai | Guangyu Chen | Qifeng Wu | Huiting Huang | Hang Wang | Qi Dai | Alexander G Hauptmann | Zhi-Qi Cheng
Findings of the Association for Computational Linguistics: ACL 2026
We introduce Goal-Conditioned Visual Navigation Instruction Generation (GoViG), a new task that aims to generate contextually coherent navigation instructions solely from egocentric visual observations of initial and goal states. Unlike prior work relying on structured inputs, such as semantic annotations or environmental maps, GoViG exclusively leverages raw egocentric visual data, improving adaptability to unseen and unstructured environments. Our method addresses this task by decomposing it into two interconnected subtasks: (1) navigation visualization, predicting intermediate visual states bridging the initial and goal views; and (2) instruction generation, synthesizing coherent instructions grounded in observed and anticipated visuals. Both subtasks are integrated within an autoregressive multimodal LLM trained with tailored objectives to ensure spatial accuracy and linguistic clarity. Furthermore, we introduce two multimodal reasoning strategies, one-pass and interleaved reasoning, to mimic incremental human navigation cognition. To comprehensively evaluate our method, we propose the R2R-Goal dataset, combining diverse synthetic and real-world trajectories. Empirical results demonstrate significant performance improvements over state-of-the-art methods in BLEU-4 and CIDEr scores along with robust cross-domain generalization. Our project is available at https://github.com/F1y1113/GoViG.
2025
Large Language Model Agents in Finance: A Survey Bridging Research, Practice, and Real-World Deployment
Yifei Dong | Fengyi Wu | Kunlin Zhang | Yilong Dai | Sanjian Zhang | Wanghao Ye | Sihan Chen | Zhi-Qi Cheng
Findings of the Association for Computational Linguistics: EMNLP 2025
Yifei Dong | Fengyi Wu | Kunlin Zhang | Yilong Dai | Sanjian Zhang | Wanghao Ye | Sihan Chen | Zhi-Qi Cheng
Findings of the Association for Computational Linguistics: EMNLP 2025
Large language models (LLMs) are increasingly applied to finance, yet challenges remain in aligning their capabilities with real-world institutional demands. In this survey, we provide a systematic, dual-perspective review bridging financial practice and LLM research. From a practitioner-centric standpoint, we introduce a functional taxonomy covering five core financial domains—Data Analysis, Investment Research, Trading, Investment Management, and Risk Management—mapping each to representative tasks, datasets, and institutional constraints. From a research-focused perspective, we analyze key modeling challenges, including numerical reasoning limitations, prompt sensitivity, and lack of real-time adaptability. We comprehensively catalog over 30 financial benchmarks and 20 representative models, and compare them across modalities, tasks, and deployment limitations. Finally, we identify open challenges and outline emerging directions such as continual adaptation, coordination-aware multi-agent systems, and privacy-compliant deployment. We emphasize deeper researcher–practitioner collaboration and transparent model architectures as critical pathways to safer and more scalable AI adoption in finance.