Han Ding
2025
A Systematic Survey of Automatic Prompt Optimization Techniques
Kiran Ramnath
|
Kang Zhou
|
Sheng Guan
|
Soumya Smruti Mishra
|
Xuan Qi
|
Zhengyuan Shen
|
Shuai Wang
|
Sangmin Woo
|
Sullam Jeoung
|
Yawei Wang
|
Haozhu Wang
|
Han Ding
|
Yuzhe Lu
|
Zhichao Xu
|
Yun Zhou
|
Balasubramaniam Srinivasan
|
Qiaojing Yan
|
Yueyan Chen
|
Haibo Ding
|
Panpan Xu
|
Lin Lee Cheong
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Since the advent of large language models (LLMs), prompt engineering has been a crucial step for eliciting desired responses for various Natural Language Processing (NLP) tasks. However, prompt engineering remains an impediment for end users due to rapid advances in models, tasks, and associated best practices. To mitigate this, Automatic Prompt Optimization (APO) techniques have recently emerged that use various automated techniques to help improve the performance of LLMs on various tasks. In this paper, we present a comprehensive survey summarizing the current progress and remaining challenges in this field. We provide a formal definition of APO, a 5-part unifying framework, and then proceed to rigorously categorize all relevant works based on their salient features therein. We hope to spur further research guided by our framework.
2021
Semantic Aligned Multi-modal Transformer for Vision-LanguageUnderstanding: A Preliminary Study on Visual QA
Han Ding
|
Li Erran Li
|
Zhiting Hu
|
Yi Xu
|
Dilek Hakkani-Tur
|
Zheng Du
|
Belinda Zeng
Proceedings of the Third Workshop on Multimodal Artificial Intelligence
Recent vision-language understanding approaches adopt a multi-modal transformer pre-training and finetuning paradigm. Prior work learns representations of text tokens and visual features with cross-attention mechanisms and captures the alignment solely based on indirect signals. In this work, we propose to enhance the alignment mechanism by incorporating image scene graph structures as the bridge between the two modalities, and learning with new contrastive objectives. In our preliminary study on the challenging compositional visual question answering task, we show the proposed approach achieves improved results, demonstrating potentials to enhance vision-language understanding.
Search
Fix author
Co-authors
- Yueyan Chen 1
- Lin Lee Cheong 1
- Haibo Ding 1
- Zheng Du 1
- Sheng Guan 1
- show all...