Yuqing Xie


Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information Retrieval
Wei Zhong | Jheng-Hong Yang | Yuqing Xie | Jimmy Lin
Findings of the Association for Computational Linguistics: EMNLP 2022

With the recent success of dense retrieval methods based on bi-encoders, studies have applied this approach to various interesting downstream retrieval tasks with good efficiency and in-domain effectiveness.Recently, we have also seen the presence of dense retrieval models in Math Information Retrieval (MIR) tasks,but the most effective systems remain classic retrieval methods that consider hand-crafted structure features.In this work, we try to combine the best of both worlds: a well-defined structure search method for effective formula search and efficient bi-encoder dense retrieval models to capture contextual similarities.Specifically, we have evaluated two representative bi-encoder models for token-level and passage-level dense retrieval on recent MIR tasks.Our results show that bi-encoder models are highly complementary to existing structure search methods, and we are able to advance the state-of-the-art on MIR datasets.


Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing Regressions In NLP Model Updates
Yuqing Xie | Yi-An Lai | Yuanjun Xiong | Yi Zhang | Stefano Soatto
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Behavior of deep neural networks can be inconsistent between different versions. Regressions during model update are a common cause of concern that often over-weigh the benefits in accuracy or efficiency gain. This work focuses on quantifying, reducing and analyzing regression errors in the NLP model updates. Using negative flip rate as regression measure, we show that regression has a prevalent presence across tasks in the GLUE benchmark. We formulate the regression-free model updates into a constrained optimization problem, and further reduce it into a relaxed form which can be approximately optimized through knowledge distillation training method. We empirically analyze how model ensemble reduces regression. Finally, we conduct CheckList behavioral testing to understand the distribution of regressions across linguistic phenomena, and the efficacy of ensemble and distillation methods.


End-to-End Open-Domain Question Answering with BERTserini
Wei Yang | Yuqing Xie | Aileen Lin | Xingyu Li | Luchen Tan | Kun Xiong | Ming Li | Jimmy Lin
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)

We demonstrate an end-to-end question answering system that integrates BERT with the open-source Anserini information retrieval toolkit. In contrast to most question answering and reading comprehension models today, which operate over small amounts of input text, our system integrates best practices from IR with a BERT-based reader to identify answers from a large corpus of Wikipedia articles in an end-to-end fashion. We report large improvements over previous results on a standard benchmark test collection, showing that fine-tuning pretrained BERT with SQuAD is sufficient to achieve high accuracy in identifying answer spans.