Haofu Liao
2025
Turbocharging Web Automation: The Impact of Compressed History States
Xiyue Zhu
|
Peng Tang
|
Haofu Liao
|
Srikar Appalaraju
Findings of the Association for Computational Linguistics: ACL 2025
Language models have led to leap forward in web automation. The current web automation approaches take the current web state, history actions, and language instruction as inputs to predict the next action, overlooking the importance of history states. However, the highly verbose nature of web page states can result in long input sequence and sparse information, hampering the effective utilization of history states. In this paper, we propose a novel web history compressor approach to turbocharge web automation using history states. Our approach employs a history compressor module that distills the most task-relevant information from each history state into a fixed-length short representation, mitigating the challenges posed by the highly verbose history states. Experiments are conducted on the Mind2Web and WebLINX datasets to evaluate the effectiveness of our approach. Results show that our approach obtains 1.2-5.4% absolute accuracy improvements compared to the baseline approach without history inputs.
2024
DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models
Sungnyun Kim
|
Haofu Liao
|
Srikar Appalaraju
|
Peng Tang
|
Zhuowen Tu
|
Ravi Kumar Satzoda
|
R. Manmatha
|
Vijay Mahadevan
|
Stefano Soatto
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Visual document understanding (VDU) is a challenging task that involves understanding documents across various modalities (text and image) and layouts (forms, tables, etc.). This study aims to enhance generalizability of small VDU models by distilling knowledge from LLMs. We identify that directly prompting LLMs often fails to generate informative and useful data. In response, we present a new framework (called DocKD) that enriches the data generation process by integrating external document knowledge. Specifically, we provide an LLM with various document elements like key-value pairs, layouts, and descriptions, to elicit open-ended answers. Our experiments show that DocKD produces high-quality document annotations and surpasses the direct knowledge distillation approach that does not leverage external document knowledge. Moreover, student VDU models trained with solely DocKD-generated data is not only comparable to those trained with human-annotated data on in-domain tasks but also significantly excel them on out-of-domain tasks.
Search
Fix author
Co-authors
- Srikar Appalaraju 2
- Peng Tang 2
- Sungnyun Kim 1
- Vijay Mahadevan 1
- R. Manmatha 1
- show all...