Ruty Rinott

2020

pdf abs
MLQA: Evaluating Cross-lingual Extractive Question Answering
Patrick Lewis | Barlas Oguz | Ruty Rinott | Sebastian Riedel | Holger Schwenk
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Question answering (QA) models have shown rapid progress enabled by the availability of large, high-quality benchmark datasets. Such annotated datasets are difficult and costly to collect, and rarely exist in languages other than English, making building QA systems that work well in other languages challenging. In order to develop such systems, it is crucial to invest in high quality multilingual evaluation benchmarks to measure progress. We present MLQA, a multi-way aligned extractive QA evaluation benchmark intended to spur research in this area. MLQA contains QA instances in 7 languages, English, Arabic, German, Spanish, Hindi, Vietnamese and Simplified Chinese. MLQA has over 12K instances in English and 5K in each other language, with each instance parallel between 4 languages on average. We evaluate state-of-the-art cross-lingual models and machine-translation-based baselines on MLQA. In all cases, transfer results are shown to be significantly behind training-language performance.

2018

State-of-the-art natural language processing systems rely on supervision in the form of annotated data to learn competent models. These models are generally trained on data in a single language (usually English), and cannot be directly used beyond that language. Since collecting data in every language is not realistic, there has been a growing interest in cross-lingual language understanding (XLU) and low-resource cross-language transfer. In this work, we construct an evaluation set for XLU by extending the development and test sets of the Multi-Genre Natural Language Inference Corpus (MultiNLI) to 14 languages, including low-resource languages such as Swahili and Urdu. We hope that our dataset, dubbed XNLI, will catalyze research in cross-lingual sentence understanding by providing an informative standard evaluation task. In addition, we provide several baselines for multilingual sentence understanding, including two based on machine translation systems, and two that use parallel data to train aligned multilingual bag-of-words and LSTM encoders. We find that XNLI represents a practical and challenging evaluation suite, and that directly translating the test data yields the best performance among available baselines.

2015

pdf
Show Me Your Evidence - an Automatic Method for Context Dependent Evidence Detection
Ruty Rinott | Lena Dankin | Carlos Alzate Perez | Mitesh M. Khapra | Ehud Aharoni | Noam Slonim
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf
TR9856: A Multi-word Term Relatedness Benchmark
Ran Levy | Liat Ein-Dor | Shay Hummel | Ruty Rinott | Noam Slonim
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

Co-authors

Venues

emnlp2
acl2
coling1
lrec1
ws1
show all...

ijcnlp1