Martin Franz


2023

pdf
PrimeQA: The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development
Avi Sil | Jaydeep Sen | Bhavani Iyer | Martin Franz | Kshitij Fadnis | Mihaela Bornea | Sara Rosenthal | Scott McCarley | Rong Zhang | Vishwajeet Kumar | Yulong Li | Md Arafat Sultan | Riyaz Bhat | Juergen Bross | Radu Florian | Salim Roukos
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

The field of Question Answering (QA) has made remarkable progress in recent years, thanks to the advent of large pre-trained language models, newer realistic benchmark datasets with leaderboards, and novel algorithms for key components such as retrievers and readers. In this paper, we introduce PrimeQA: a one-stop and open-source QA repository with an aim to democratize QA research and facilitate easy replication of state-of-the-art (SOTA) QA methods. PrimeQA supports core QA functionalities like retrieval and reading comprehension as well as auxiliary capabilities such as question generation. It has been designed as an end-to-end toolkit for various use cases: building front-end applications, replicating SOTA methods on public benchmarks, and expanding pre-existing methods. PrimeQA is available at: https://github.com/primeqa.

pdf
Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking
Keshav Santhanam | Jon Saad-Falcon | Martin Franz | Omar Khattab | Avi Sil | Radu Florian | Md Arafat Sultan | Salim Roukos | Matei Zaharia | Christopher Potts
Findings of the Association for Computational Linguistics: ACL 2023

Neural information retrieval (IR) systems have progressed rapidly in recent years, in large part due to the release of publicly available benchmarking tasks. Unfortunately, some dimensions of this progress are illusory: the majority of the popular IR benchmarks today focus exclusively on downstream task accuracy and thus conceal the costs incurred by systems that trade away efficiency for quality. Latency, hardware cost, and other efficiency considerations are paramount to the deployment of IR systems in user-facing settings. We propose that IR benchmarks structure their evaluation methodology to include not only metrics of accuracy, but also efficiency considerations such as a query latency and the corresponding cost budget for a reproducible hardware setting. For the popular IR benchmarks MS MARCO and XOR-TyDi, we show how the best choice of IR system varies according to how these efficiency considerations are chosen and weighed. We hope that future benchmarks will adopt these guidelines toward more holistic IR evaluation.

pdf
UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers
Jon Saad-Falcon | Omar Khattab | Keshav Santhanam | Radu Florian | Martin Franz | Salim Roukos | Avirup Sil | Md Sultan | Christopher Potts
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Many information retrieval tasks require large labeled datasets for fine-tuning. However, such datasets are often unavailable, and their utility for real-world applications can diminish quickly due to domain shifts. To address this challenge, we develop and motivate a method for using large language models (LLMs) to generate large numbers of synthetic queries cheaply. The method begins by generating a small number of synthetic queries using an expensive LLM. After that, a much less expensive one is used to create large numbers of synthetic queries, which are used to fine-tune a family of reranker models. These rerankers are then distilled into a single efficient retriever for use in the target domain. We show that this technique boosts zero-shot accuracy in long-tail domains and achieves substantially lower latency than standard reranking methods.

2022

pdf
Learning Cross-Lingual IR from an English Retriever
Yulong Li | Martin Franz | Md Arafat Sultan | Bhavani Iyer | Young-Suk Lee | Avirup Sil
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We present DR.DECR (Dense Retrieval with Distillation-Enhanced Cross-Lingual Representation), a new cross-lingual information retrieval (CLIR) system trained using multi-stage knowledge distillation (KD). The teacher of DR.DECR relies on a highly effective but computationally expensive two-stage inference process consisting of query translation and monolingual IR, while the student, DR.DECR, executes a single CLIR step. We teach DR.DECR powerful multilingual representations as well as CLIR by optimizing two corresponding KD objectives. Learning useful representations of non-English text from an English-only retriever is accomplished through a cross-lingual token alignment algorithm that relies on the representation capabilities of the underlying multilingual encoders. In both in-domain and zero-shot out-of-domain evaluation, DR.DECR demonstrates far superior accuracy over direct fine-tuning with labeled CLIR data. It is also the best single-model retriever on the XOR-TyDi benchmark at the time of this writing.

pdf
Towards Robust Neural Retrieval with Source Domain Synthetic Pre-Finetuning
Revanth Gangi Reddy | Vikas Yadav | Md Arafat Sultan | Martin Franz | Vittorio Castelli | Heng Ji | Avirup Sil
Proceedings of the 29th International Conference on Computational Linguistics

Research on neural IR has so far been focused primarily on standard supervised learning settings, where it outperforms traditional term matching baselines. Many practical use cases of such models, however, may involve previously unseen target domains. In this paper, we propose to improve the out-of-domain generalization of Dense Passage Retrieval (DPR) - a popular choice for neural IR - through synthetic data augmentation only in the source domain. We empirically show that pre-finetuning DPR with additional synthetic data in its source domain (Wikipedia), which we generate using a fine-tuned sequence-to-sequence generator, can be a low-cost yet effective first step towards its generalization. Across five different test sets, our augmented model shows more robust performance than DPR in both in-domain and zero-shot out-of-domain evaluation.

2020

pdf
The TechQA Dataset
Vittorio Castelli | Rishav Chakravarti | Saswati Dana | Anthony Ferritto | Radu Florian | Martin Franz | Dinesh Garg | Dinesh Khandelwal | Scott McCarley | Michael McCawley | Mohamed Nasr | Lin Pan | Cezar Pendus | John Pitrelli | Saurabh Pujar | Salim Roukos | Andrzej Sakrajda | Avi Sil | Rosario Uceda-Sosa | Todd Ward | Rong Zhang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We introduce TECHQA, a domain-adaptation question answering dataset for the technical support domain. The TECHQA corpus highlights two real-world issues from the automated customer support domain. First, it contains actual questions posed by users on a technical forum, rather than questions generated specifically for a competition or a task. Second, it has a real-world size – 600 training, 310 dev, and 490 evaluation question/answer pairs – thus reflecting the cost of creating large labeled datasets with actual data. Hence, TECHQA is meant to stimulate research in domain adaptation rather than as a resource to build QA systems from scratch. TECHQA was obtained by crawling the IBMDeveloper and DeveloperWorks forums for questions with accepted answers provided in an IBM Technote—a technical document that addresses a specific technical issue. We also release a collection of the 801,998 Technotes available on the web as of April 4, 2019 as a companion resource that can be used to learn representations of the IT domain language.

2010

pdf
Learning to Predict Readability using Diverse Linguistic Features
Rohit Kate | Xiaoqiang Luo | Siddharth Patwardhan | Martin Franz | Radu Florian | Raymond Mooney | Salim Roukos | Chris Welty
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2003

pdf bib
TIPS: A Translingual Information Processing System
Yaser Al-Onaizan | Radu Florian | Martin Franz | Hany Hassan | Young-Suk Lee | J. Scott McCarley | Kishore Papineni | Salim Roukos | Jeffrey Sorensen | Christoph Tillmann | Todd Ward | Fei Xia
Companion Volume of the Proceedings of HLT-NAACL 2003 - Demonstrations

2001

pdf
Question Answering Using Maximum-Entropy Components
Abraham Ittycheriah | Martin Franz | Wei-Jing Zhu | Adwait Ratnaparkhi
Second Meeting of the North American Chapter of the Association for Computational Linguistics