This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we generate only three BibTeX files per volume, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
As Natural Language Processing (NLP) algorithms continually achieve new milestones, out-of-distribution generalization remains a significant challenge. This paper addresses the issue of multi-source adaptation for unfamiliar domains: We leverage labeled data from multiple source domains to generalize to unknown target domains at training. Our innovative framework employs example-based Hypernetwork adaptation: a T5 encoder-decoder initially generates a unique signature from an input example, embedding it within the source domains’ semantic space. This signature is subsequently utilized by a Hypernetwork to generate the task classifier’s weights. In an advanced version, the signature also enriches the input example’s representation. We evaluated our method across two tasks—sentiment classification and natural language inference—in 29 adaptation scenarios, where it outpaced established algorithms. We also compare our finetuned architecture to few-shot GPT-3, demonstrating its effectiveness in essential use cases. To the best of our knowledge, this marks the first application of Hypernetworks to the adaptation for unknown domains.
Natural language processing (NLP) algorithms have become very successful, but they still struggle when applied to out-of-distribution examples. In this paper we propose a controllable generation approach in order to deal with this domain adaptation (DA) challenge. Given an input text example, our DoCoGen algorithm generates a domain-counterfactual textual example (D-con) - that is similar to the original in all aspects, including the task label, but its domain is changed to a desired one. Importantly, DoCoGen is trained using only unlabeled examples from multiple domains - no NLP task labels or parallel pairs of textual examples and their domain-counterfactuals are required. We show that DoCoGen can generate coherent counterfactuals consisting of multiple sentences. We use the D-cons generated by DoCoGen to augment a sentiment classifier and a multi-label intent classifier in 20 and 78 DA setups, respectively, where source-domain labeled data is scarce. Our model outperforms strong baselines and improves the accuracy of a state-of-the-art unsupervised DA algorithm.
Large pre-trained models are usually fine-tuned on downstream task data, and tested on unseen data. When the train and test data come from different domains, the model is likely to struggle, as it is not adapted to the test domain. We propose a new approach for domain adaptation (DA), using neuron-level interventions: We modify the representation of each test example in specific neurons, resulting in a counterfactual example from the source domain, which the model is more familiar with. The modified example is then fed back into the model. While most other DA methods are applied during training time, ours is applied during inference only, making it more efficient and applicable. Our experiments show that our method improves performance on unseen domains.
Natural Language Processing algorithms have made incredible progress, but they still struggle when applied to out-of-distribution examples. We address a challenging and underexplored version of this domain adaptation problem, where an algorithm is trained on several source domains, and then applied to examples from unseen domains that are unknown at training time. Particularly, no examples, labeled or unlabeled, or any other knowledge about the target domain are available to the algorithm at training time. We present PADA: An example-based autoregressive Prompt learning algorithm for on-the-fly Any-Domain Adaptation, based on the T5 language model. Given a test example, PADA first generates a unique prompt for it and then, conditioned on this prompt, labels the example with respect to the NLP prediction task. PADA is trained to generate a prompt that is a token sequence of unrestricted length, consisting of Domain Related Features (DRFs) that characterize each of the source domains. Intuitively, the generated prompt is a unique signature that maps the test example to a semantic space spanned by the source domains. In experiments with 3 tasks (text classification and sequence tagging), for a total of 14 multi-source adaptation scenarios, PADA substantially outperforms strong baselines.1
Pivot-based neural representation models have led to significant progress in domain adaptation for NLP. However, previous research following this approach utilize only labeled data from the source domain and unlabeled data from the source and target domains, but neglect to incorporate massive unlabeled corpora that are not necessarily drawn from these domains. To alleviate this, we propose PERL: A representation learning model that extends contextualized word embedding models such as BERT (Devlin et al., 2019) with pivot-based fine-tuning. PERL outperforms strong baselines across 22 sentiment classification domain adaptation setups, improves in-domain model performance, yields effective reduced-size models, and increases model stability.1
Sentence fusion is the task of joining related sentences into coherent text. Current training and evaluation schemes for this task are based on single reference ground-truths and do not account for valid fusion variants. We show that this hinders models from robustly capturing the semantic relationship between input sentences. To alleviate this, we present an approach in which ground-truth solutions are automatically expanded into multiple references via curated equivalence classes of connective phrases. We apply this method to a large-scale dataset and use the augmented dataset for both model training and evaluation. To improve the learning of semantic representation using multiple references, we enrich the model with auxiliary discourse classification tasks under a multi-tasking framework. Our experiments highlight the improvements of our approach over state-of-the-art models.