Yaru Hao
2023
Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers
Damai Dai
|
Yutao Sun
|
Li Dong
|
Yaru Hao
|
Shuming Ma
|
Zhifang Sui
|
Furu Wei
Findings of the Association for Computational Linguistics: ACL 2023
2022
Knowledge Neurons in Pretrained Transformers
Damai Dai
|
Li Dong
|
Yaru Hao
|
Zhifang Sui
|
Baobao Chang
|
Furu Wei
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
2021
Learning to Sample Replacements for ELECTRA Pre-Training
Yaru Hao
|
Li Dong
|
Hangbo Bao
|
Ke Xu
|
Furu Wei
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
2020
Investigating Learning Dynamics of BERT Fine-Tuning
Yaru Hao
|
Li Dong
|
Furu Wei
|
Ke Xu
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing
2019
Visualizing and Understanding the Effectiveness of BERT
Yaru Hao
|
Li Dong
|
Furu Wei
|
Ke Xu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)