Jaeyoung Do
2023
Large-scale Lifelong Learning of In-context Instructions and How to Tackle It
Jisoo Mok
|
Jaeyoung Do
|
Sungjin Lee
|
Tara Taghavi
|
Seunghak Yu
|
Sungroh Yoon
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jointly fine-tuning a Pre-trained Language Model (PLM) on a pre-defined set of tasks with in-context instructions has been proven to improve its generalization performance, allowing us to build a universal language model that can be deployed across task boundaries. In this work, we explore for the first time whether this attractive property of in-context instruction learning can be extended to a scenario in which tasks are fed to the target PLM in a sequential manner. The primary objective of so-called lifelong in-context instruction learning is to improve the target PLM’s instance- and task-level generalization performance as it observes more tasks. DynaInst, the proposed method to lifelong in-context instruction learning, achieves noticeable improvements in both types of generalization, nearly reaching the upper bound performance obtained through joint training.
Scalable and Safe Remediation of Defective Actions in Self-Learning Conversational Systems
Sarthak Ahuja
|
Mohammad Kachuee
|
Fatemeh Sheikholeslami
|
Weiqing Liu
|
Jaeyoung Do
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)
Off-Policy reinforcement learning has been the driving force for the state-of-the-art conversational AIs leading to more natural human-agent interactions and improving the user satisfaction for goal-oriented agents. However, in large-scale commercial settings, it is often challenging to balance between policy improvements and experience continuity on the broad spectrum of applications handled by such system. In the literature, off-policy evaluation and guard-railing on aggregate statistics has been commonly used to address this problem. In this paper, we propose method for curating and leveraging high-precision samples sourced from historical regression incident reports to validate, safe-guard, and improve policies prior to the online deployment. We conducted extensive experiments using data from a real-world conversational system and actual regression incidents. The proposed method is currently deployed in our production system to protect customers against broken experiences and enable long-term policy improvements.
Search
Co-authors
- Jisoo Mok 1
- Sungjin Lee 1
- Tara Taghavi 1
- Seunghak Yu 1
- Sungroh Yoon 1
- show all...
Venues
- acl2