Siamak Shakeri


2023

pdf
Transcending Scaling Laws with 0.1% Extra Compute
Yi Tay | Jason Wei | Hyung Chung | Vinh Tran | David So | Siamak Shakeri | Xavier Garcia | Steven Zheng | Jinfeng Rao | Aakanksha Chowdhery | Denny Zhou | Donald Metzler | Slav Petrov | Neil Houlsby | Quoc Le | Mostafa Dehghani
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Scaling language models improves performance but comes with significant computational costs. This paper proposes UL2R, a method that substantially improves existing language models and their scaling curves with a relatively tiny amount of extra compute. The key idea is to continue training a state-of-the-art large language model on a few more steps with UL2’s mixture-of-denoiser objective. We show that, with almost negligible extra computational costs and no new sources of data, we are able to substantially improve the scaling properties of large language models on downstream metrics. In this paper, we continue training a baseline language model, PaLM, with ULR2, introducing a new set of models at 8B, 62B, and 540B scale which we call U-PaLM. Impressively, at 540B scale, we show an approximately 2x computational savings rate where U-PaLM achieves the same performance as the final PaLM 540B model at around half its computational budget (i.e., saving ~4.4 million TPUv4 hours). We further show that this improved scaling curve leads to “emergent abilities” on challenging BIG-Bench tasks—for instance, U-PaLM does much better on some tasks or demonstrates better quality at much smaller scale (62B as opposed to 540B). Overall, we show that U-PaLM outperforms PaLM on many few-shot setups, including reasoning tasks with chain-of-thought (e.g., GSM8K), multilingual tasks (MGSM, TydiQA), MMLU and challenging BIG-Bench tasks.

pdf
Triggering Multi-Hop Reasoning for Question Answering in Language Models using Soft Prompts and Random Walks
Kanishka Misra | Cicero Nogueira dos Santos | Siamak Shakeri
Findings of the Association for Computational Linguistics: ACL 2023

Despite readily memorizing world knowledge about entities, pre-trained language models (LMs) struggle to compose together two or more facts to perform multi-hop reasoning in question-answering tasks. In this work, we propose techniques that improve upon this limitation by relying on random-walks over structured knowledge graphs. Specifically, we use soft-prompts to guide LMs to chain together their encoded knowledge by learning to map multi-hop questions to random-walk paths that lead to the answer. Applying our methods on two T5 LMs shows substantial improvements over standard tuning approaches in answering questions that require multi-hop reasoning.

2021

pdf
Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training
Oshin Agarwal | Heming Ge | Siamak Shakeri | Rami Al-Rfou
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Prior work on Data-To-Text Generation, the task of converting knowledge graph (KG) triples into natural text, focused on domain-specific benchmark datasets. In this paper, however, we verbalize the entire English Wikidata KG, and discuss the unique challenges associated with a broad, open-domain, large-scale verbalization. We further show that verbalizing a comprehensive, encyclopedic KG like Wikidata can be used to integrate structured KGs and natural language corpora. In contrast to the many architectures that have been developed to integrate these two sources, our approach converts the KG into natural text, allowing it to be seamlessly integrated into existing language models. It carries the further advantages of improved factual accuracy and reduced toxicity in the resulting language model. We evaluate this approach by augmenting the retrieval corpus in a retrieval language model and showing significant improvements on the knowledge intensive tasks of open domain QA and the LAMA knowledge probe.

pdf
Towards Zero-Shot Multilingual Synthetic Question and Answer Generation for Cross-Lingual Reading Comprehension
Siamak Shakeri | Noah Constant | Mihir Kale | Linting Xue
Proceedings of the 14th International Conference on Natural Language Generation

We propose a simple method to generate multilingual question and answer pairs on a large scale through the use of a single generative model. These synthetic samples can be used to improve the zero-shot performance of multilingual QA models on target languages. Our proposed multi-task training of the generative model only requires labeled training samples in English, thus removing the need for such samples in the target languages, making it applicable to far more languages than those with labeled data. Human evaluations indicate the majority of such samples are grammatically correct and sensible. Experimental results show our proposed approach can achieve large gains on the XQuAD dataset, reducing the gap between zero-shot and supervised performance of smaller QA models on various languages.

pdf
ParsiNLU: A Suite of Language Understanding Challenges for Persian
Daniel Khashabi | Arman Cohan | Siamak Shakeri | Pedram Hosseini | Pouya Pezeshkpour | Malihe Alikhani | Moin Aminnaseri | Marzieh Bitaab | Faeze Brahman | Sarik Ghazarian | Mozhdeh Gheini | Arman Kabiri | Rabeeh Karimi Mahabagdi | Omid Memarrast | Ahmadreza Mosallanezhad | Erfan Noury | Shahab Raji | Mohammad Sadegh Rasooli | Sepideh Sadeghi | Erfan Sadeqi Azer | Niloofar Safi Samghabadi | Mahsa Shafaei | Saber Sheybani | Ali Tazarv | Yadollah Yaghoobzadeh
Transactions of the Association for Computational Linguistics, Volume 9

Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this language. The availability of high-quality evaluation datasets is a necessity for reliable assessment of the progress on different NLU tasks and domains. We introduce ParsiNLU, the first benchmark in Persian language that includes a range of language understanding tasks—reading comprehension, textual entailment, and so on. These datasets are collected in a multitude of ways, often involving manual annotations by native speakers. This results in over 14.5k new instances across 6 distinct NLU tasks. Additionally, we present the first results on state-of-the-art monolingual and multilingual pre-trained language models on this benchmark and compare them with human performance, which provides valuable insights into our ability to tackle natural language understanding challenges in Persian. We hope ParsiNLU fosters further research and advances in Persian language understanding.1

2020

pdf
Machine Translation Aided Bilingual Data-to-Text Generation and Semantic Parsing
Oshin Agarwal | Mihir Kale | Heming Ge | Siamak Shakeri | Rami Al-Rfou
Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+)

We present a system for bilingual Data-ToText Generation and Semantic Parsing. We use a text-to-text generator to learn a single model that works for both languages on each of the tasks. The model is aided by machine translation during both pre-training and fine-tuning. We evaluate the system on WebNLG 2020 data 1 , which consists of RDF triples in English and natural language sentences in English and Russian for both the tasks. We achieve considerable gains over monolingual models, especially on unseen relations and Russian.

pdf
End-to-End Synthetic Data Generation for Domain Adaptation of Question Answering Systems
Siamak Shakeri | Cicero Nogueira dos Santos | Henghui Zhu | Patrick Ng | Feng Nan | Zhiguo Wang | Ramesh Nallapati | Bing Xiang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We propose an end-to-end approach for synthetic QA data generation. Our model comprises a single transformer-based encoder-decoder network that is trained end-to-end to generate both answers and questions. In a nutshell, we feed a passage to the encoder and ask the decoder to generate a question and an answer token-by-token. The likelihood produced in the generation process is used as a filtering score, which avoids the need for a separate filtering model. Our generator is trained by fine-tuning a pretrained LM using maximum likelihood estimation. The experimental results indicate significant improvements in the domain adaptation of QA models outperforming current state-of-the-art methods.