Zhifeng Chen
2026
Incomplete In-context Learning
Wenqiang Wang | Wen Yujia | Yan Xiao | Zhifeng Chen | Yangshijie Zhang | Peng Chen | Mingbo Yang | Xiaochun Cao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Wenqiang Wang | Wen Yujia | Yan Xiao | Zhifeng Chen | Yangshijie Zhang | Peng Chen | Mingbo Yang | Xiaochun Cao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Existing In-context Learning (ICL) typically assumes the retrieval dataset contains demonstrations for all output label spaces. However, in real-world scenarios, delays in dataset updates or incomplete data annotation may result in the retrieval dataset containing labeled demonstrations for only a subset of the output space. We refer to this phenomenon as an incomplete retrieval dataset and define the in-context learning under this condition as Incomplete In-context Learning (IICL). To address IICL, we propose Iterative Judgments and Integrated Prediction (IJIP), a framework with train-free and train-based variants. For classification, the iterative judgments stage of IJIP reformulates an (m)-class problem into (m) binary tasks, converting IICL into standard ICL. The integrated prediction stage of IJIP then refines results using both the input and initial predictions. We further extend IJIP to text regression and generation, and introduce lightweight variants that reduce computation and token costs. Across six LLMs, seven tasks, and eight datasets, IJIP achieves state-of-the-art results under two incompleteness settings and even outperforms standard ICL with complete labels. IJIP also supports a semi-supervised variant and can serve as a plug-and-play enhancement for existing ICL and zero-shot methods.
2018
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation
Mia Xu Chen | Orhan Firat | Ankur Bapna | Melvin Johnson | Wolfgang Macherey | George Foster | Llion Jones | Mike Schuster | Noam Shazeer | Niki Parmar | Ashish Vaswani | Jakob Uszkoreit | Lukasz Kaiser | Zhifeng Chen | Yonghui Wu | Macduff Hughes
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Mia Xu Chen | Orhan Firat | Ankur Bapna | Melvin Johnson | Wolfgang Macherey | George Foster | Llion Jones | Mike Schuster | Noam Shazeer | Niki Parmar | Ashish Vaswani | Jakob Uszkoreit | Lukasz Kaiser | Zhifeng Chen | Yonghui Wu | Macduff Hughes
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The past year has witnessed rapid advances in sequence-to-sequence (seq2seq) modeling for Machine Translation (MT). The classic RNN-based approaches to MT were first out-performed by the convolutional seq2seq model, which was then out-performed by the more recent Transformer model. Each of these new approaches consists of a fundamental architecture accompanied by a set of modeling and training techniques that are in principle applicable to other seq2seq architectures. In this paper, we tease apart the new architectures and their accompanying techniques in two ways. First, we identify several key modeling and training techniques, and apply them to the RNN architecture, yielding a new RNMT+ model that outperforms all of the three fundamental architectures on the benchmark WMT’14 English to French and English to German tasks. Second, we analyze the properties of each fundamental seq2seq architecture and devise new hybrid architectures intended to combine their strengths. Our hybrid models obtain further improvements, outperforming the RNMT+ model on both benchmark datasets.
2017
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
Melvin Johnson | Mike Schuster | Quoc V. Le | Maxim Krikun | Yonghui Wu | Zhifeng Chen | Nikhil Thorat | Fernanda Viégas | Martin Wattenberg | Greg Corrado | Macduff Hughes | Jeffrey Dean
Transactions of the Association for Computational Linguistics, Volume 5
Melvin Johnson | Mike Schuster | Quoc V. Le | Maxim Krikun | Yonghui Wu | Zhifeng Chen | Nikhil Thorat | Fernanda Viégas | Martin Wattenberg | Greg Corrado | Macduff Hughes | Jeffrey Dean
Transactions of the Association for Computational Linguistics, Volume 5
We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no changes to the model architecture from a standard NMT system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. Using a shared wordpiece vocabulary, our approach enables Multilingual NMT systems using a single model. On the WMT’14 benchmarks, a single multilingual model achieves comparable performance for English→French and surpasses state-of-theart results for English→German. Similarly, a single multilingual model surpasses state-of-the-art results for French→English and German→English on WMT’14 and WMT’15 benchmarks, respectively. On production corpora, multilingual models of up to twelve language pairs allow for better translation of many individual pairs. Our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation. Finally, we show analyses that hints at a universal interlingua representation in our models and also show some interesting examples when mixing languages.
Search
Fix author
Co-authors
- Macduff Hughes 2
- Melvin Johnson 2
- Mike Schuster 2
- Yonghui Wu 2
- Ankur Bapna 1
- Xiaochun Cao 1
- Peng Chen 1
- Mia Xu Chen 1
- Greg Corrado 1
- Jeffrey Dean 1
- Orhan Firat 1
- George Foster 1
- Llion Jones 1
- Łukasz Kaiser 1
- Maxim Krikun 1
- Quoc Le 1
- Wolfgang Macherey 1
- Niki Parmar 1
- Noam Shazeer 1
- Nikhil Thorat 1
- Jakob Uszkoreit 1
- Ashish Vaswani 1
- Fernanda Viégas 1
- Wenqiang Wang 1
- Martin Wattenberg 1
- Yan Xiao 1
- Mingbo Yang 1
- Wen Yujia 1
- Yangshijie Zhang 1