Zixian Ma
2025
LATTE: Learning to Think with Vision Specialists
Zixian Ma
|
Jianguo Zhang
|
Zhiwei Liu
|
Jieyu Zhang
|
Juntao Tan
|
Manli Shu
|
Juan Carlos Niebles
|
Shelby Heinecke
|
Huan Wang
|
Caiming Xiong
|
Ranjay Krishna
|
Silvio Savarese
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
While open-source vision-language models perform well on simple question-answering, they still struggle with complex questions that require both perceptual and reasoning capabilities. We propose LATTE, a family of vision-language models that have LeArned to Think wiTh vision spEcialists. By offloading perception to state-of-the-art vision models, our approach enables vision-language models to focus solely on reasoning over high-quality perceptual information. To train LATTE, we synthesize and filter a large dataset of 293K multi-modal reasoning traces over perceptual outputs of vision specialists. LATTE trained on this data achieves significant 4-5% gains over baselines across 6 benchmarks covering both perception and reasoning abilities. Ablation studies reveal that the effectiveness of multi-modal reasoning traces depends on the data sources, formats, and quality of thoughts.
2021
OpenAttack: An Open-source Textual Adversarial Attack Toolkit
Guoyang Zeng
|
Fanchao Qi
|
Qianrui Zhou
|
Tingji Zhang
|
Zixian Ma
|
Bairu Hou
|
Yuan Zang
|
Zhiyuan Liu
|
Maosong Sun
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations
Textual adversarial attacking has received wide and increasing attention in recent years. Various attack models have been proposed, which are enormously distinct and implemented with different programming frameworks and settings. These facts hinder quick utilization and fair comparison of attack models. In this paper, we present an open-source textual adversarial attack toolkit named OpenAttack to solve these issues. Compared with existing other textual adversarial attack toolkits, OpenAttack has its unique strengths in support for all attack types, multilinguality, and parallel processing. Currently, OpenAttack includes 15 typical attack models that cover all attack types. Its highly inclusive modular design not only supports quick utilization of existing attack models, but also enables great flexibility and extensibility. OpenAttack has broad uses including comparing and evaluating attack models, measuring robustness of a model, assisting in developing new attack models, and adversarial training. Source code and documentation can be obtained at https://github.com/thunlp/OpenAttack.
Search
Fix author
Co-authors
- Shelby Heinecke 1
- Bairu Hou 1
- Ranjay Krishna 1
- Zhiyuan Liu 1
- Zhiwei Liu 1
- show all...