Ethan C. Chau
2024
Automatic Pair Construction for Contrastive Post-training
Canwen Xu | Corby Rosset | Ethan C. Chau | Luciano Del Corro | Shweti Mahajan | Julian McAuley | Jennifer Neville | Ahmed Hassan Awadallah | Nikhil Rao
Findings of the Association for Computational Linguistics: NAACL 2024
Canwen Xu | Corby Rosset | Ethan C. Chau | Luciano Del Corro | Shweti Mahajan | Julian McAuley | Jennifer Neville | Ahmed Hassan Awadallah | Nikhil Rao
Findings of the Association for Computational Linguistics: NAACL 2024
Alignment serves as an important step to steer large language models (LLMs) towards human preferences. In this paper, we propose an automatic way to construct contrastive data for LLM, using preference pairs from multiple models of varying strengths (e.g., InstructGPT, ChatGPT and GPT-4). We compare the contrastive techniques of SLiC and DPO to SFT baselines and find that DPO provides a step-function improvement even after continuing SFT saturates. We also explore a data curriculum learning scheme for contrastive post-training, which starts by learning from “easier” pairs and transitioning to “harder” ones, which further improves alignment. Finally, we scale up our experiments to train with more data and larger models like Orca. Remarkably, our automatic contrastive post-training further improves the performance of Orca, already a state-of-the-art instruction learning model tuned with GPT-4 outputs, to outperform ChatGPT.
Dodo: Dynamic Contextual Compression for Decoder-only LMs
Guanghui Qin | Corby Rosset | Ethan C. Chau | Nikhil Rao | Benjamin Van Durme
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Guanghui Qin | Corby Rosset | Ethan C. Chau | Nikhil Rao | Benjamin Van Durme
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Transformer-based language models (LMs) are inefficient in long contexts. We propose Dodo, a solution for context compression. Instead of one vector per token in a standard transformer model, Dodo represents text with a dynamic number of hidden states at each layer, reducing the cost of self-attention to a fraction of typical time and space. Moreover, off-the-shelf models such as LLaMA can be adapted to Dodo by efficient parameter tuning methods such as LoRA. In use, Dodo can act as either an autoregressive LM or a context compressor for downstream tasks. We demonstrate through experiments in language modeling, question answering, and summarization that Dodo retains capabilities in these tasks, while drastically reducing the overhead during decoding. For example, in the autoencoding task, Dodo shrinks context at a 20x compression ratio with a BLEU score of 98% for reconstruction, achieving nearly lossless encoding.
2021
Specializing Multilingual Language Models: An Empirical Study
Ethan C. Chau | Noah A. Smith
Proceedings of the 1st Workshop on Multilingual Representation Learning
Ethan C. Chau | Noah A. Smith
Proceedings of the 1st Workshop on Multilingual Representation Learning
Pretrained multilingual language models have become a common tool in transferring NLP capabilities to low-resource languages, often with adaptations. In this work, we study the performance, extensibility, and interaction of two such adaptations: vocabulary augmentation and script transliteration. Our evaluations on part-of-speech tagging, universal dependency parsing, and named entity recognition in nine diverse low-resource languages uphold the viability of these approaches while raising new questions around how to optimally adapt multilingual models to low-resource settings.
2020
Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank
Ethan C. Chau | Lucy H. Lin | Noah A. Smith
Findings of the Association for Computational Linguistics: EMNLP 2020
Ethan C. Chau | Lucy H. Lin | Noah A. Smith
Findings of the Association for Computational Linguistics: EMNLP 2020
Pretrained multilingual contextual representations have shown great success, but due to the limits of their pretraining data, their benefits do not apply equally to all language varieties. This presents a challenge for language varieties unfamiliar to these models, whose labeled and unlabeled data is too limited to train a monolingual model effectively. We propose the use of additional language-specific pretraining and vocabulary augmentation to adapt multilingual models to low-resource settings. Using dependency parsing of four diverse low-resource language varieties as a case study, we show that these methods significantly improve performance over baselines, especially in the lowest-resource cases, and demonstrate the importance of the relationship between such models’ pretraining data and target language varieties.