Xinyu Zhao


2021

pdf bib
Effective Distant Supervision for Temporal Relation Extraction
Xinyu Zhao | Shih-Ting Lin | Greg Durrett
Proceedings of the Second Workshop on Domain Adaptation for NLP

A principal barrier to training temporal relation extraction models in new domains is the lack of varied, high quality examples and the challenge of collecting more. We present a method of automatically collecting distantly-supervised examples of temporal relations. We scrape and automatically label event pairs where the temporal relations are made explicit in text, then mask out those explicit cues, forcing a model trained on this data to learn other signals. We demonstrate that a pre-trained Transformer model is able to transfer from the weakly labeled examples to human-annotated benchmarks in both zero-shot and few-shot settings, and that the masking scheme is important in improving generalization.

pdf bib
Flexible Generation of Natural Language Deductions
Kaj Bostrom | Xinyu Zhao | Swarat Chaudhuri | Greg Durrett
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

An interpretable system for open-domain reasoning needs to express its reasoning process in a transparent form. Natural language is an attractive representation for this purpose — it is both highly expressive and easy for humans to understand. However, manipulating natural language statements in logically consistent ways is hard: models must cope with variation in how meaning is expressed while remaining precise. In this paper, we describe ParaPattern, a method for building models to generate deductive inferences from diverse natural language inputs without direct human supervision. We train BART-based models (Lewis et al., 2020) to generate the result of applying a particular logical operation to one or more premise statements. Crucially, we develop a largely automated pipeline for constructing suitable training examples from Wikipedia. We evaluate our models using out-of-domain sentence compositions from the QASC (Khot et al., 2020) and EntailmentBank (Dalvi et al., 2021) datasets as well as targeted perturbation sets. Our results show that our models are substantially more accurate and flexible than baseline systems. ParaPattern achieves 85% validity on examples of the ‘substitution’ operation from EntailmentBank without the use of any in-domain training data, matching the performance of a model fine-tuned for EntailmentBank. The full source code for our method is publicly available.

2018

pdf bib
Domain Adaptation Using a Combination of Multiple Embeddings for Sentiment Analysis
Hiroyuki Shinnou | Xinyu Zhao | Kanako Komiya
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation