Joachim Daiber

2025

pdf bib abs
DispatchQA: A Benchmark for Small Function Calling Language Models in E-Commerce Applications
Joachim Daiber | Victor Maricato | Ayan Sinha | Andrew Rabinovich
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

We introduce DispatchQA, a benchmark to evaluate how well small language models (SLMs) translate open‐ended search queries into executable API calls via explicit function calling. Our benchmark focuses on the latency-sensitive e-commerce setting and measures SLMs’ impact on both search relevance and search latency. We provide strong, replicable baselines based on Llama 3.1 8B Instruct fine-tuned on synthetically generated data and find that fine-tuned SLMs produce search quality comparable or better than large language models such as GPT-4o while achieving up to 3× faster inference. All data, code, and training checkpoints are publicly released to spur further research on resource‐efficient query understanding.

2021

pdf bib abs
MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering
Shayne Longpre | Yi Lu | Joachim Daiber
Transactions of the Association for Computational Linguistics, Volume 9

Progress in cross-lingual modeling depends on challenging, realistic, and diverse evaluation sets. We introduce Multilingual Knowledge Questions and Answers (MKQA), an open- domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages (260k question-answer pairs in total). Answers are based on heavily curated, language- independent data representation, making results comparable across languages and independent of language-specific passages. With 26 languages, this dataset supplies the widest range of languages to-date for evaluating question answering. We benchmark a variety of state- of-the-art methods and baselines for generative and extractive question answering, trained on Natural Questions, in zero shot and translation settings. Results indicate this dataset is challenging even in English, but especially in low-resource languages.1

2016

pdf bib abs
Universal Reordering via Linguistic Typology
Joachim Daiber | Miloš Stanojević | Khalil Sima’an
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In this paper we explore the novel idea of building a single universal reordering model from English to a large number of target languages. To build this model we exploit typological features of word order for a large number of target languages together with source (English) syntactic features and we train this model on a single combined parallel corpus representing all (22) involved language pairs. We contribute experimental evidence for the usefulness of linguistically defined typological features for building such a model. When the universal reordering model is used for preordering followed by monotone translation (no reordering inside the decoder), our experiments show that this pipeline gives comparable or improved translation performance with a phrase-based baseline for a large number of language pairs (12 out of 22) from diverse language families.

pdf bib abs
The Denoised Web Treebank: Evaluating Dependency Parsing under Noisy Input Conditions
Joachim Daiber | Rob van der Goot
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We introduce the Denoised Web Treebank: a treebank including a normalization layer and a corresponding evaluation metric for dependency parsing of noisy text, such as Tweets. This benchmark enables the evaluation of parser robustness as well as text normalization methods, including normalization as machine translation and unsupervised lexical normalization, directly on syntactic trees. Experiments show that text normalization together with a combination of domain-specific and generic part-of-speech taggers can lead to a significant improvement in parsing accuracy on this test set.

pdf bib
Examining the Relationship between Preordering and Word Order Freedom in Machine Translation
Joachim Daiber | Miloš Stanojević | Wilker Aziz | Khalil Sima’an
Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers

2015

pdf bib
Machine translation with source-predicted target morphology
Joachim Daiber | Khalil Sima’an
Proceedings of Machine Translation Summit XV: Papers

pdf bib
Splitting Compounds by Semantic Analogy
Joachim Daiber | Lautaro Quiroz | Roger Wechsler | Stella Frank
Proceedings of the 1st Deep Machine Translation Workshop

pdf bib
Delimiting Morphosyntactic Search Space with Source-Side Reordering Models
Joachim Daiber | Khalil Sima’an
Proceedings of the 1st Deep Machine Translation Workshop

2012

pdf bib abs
Evaluating the Impact of Phrase Recognition on Concept Tagging
Pablo Mendes | Joachim Daiber | Rohana Rajapakse | Felix Sasaki | Christian Bizer
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We have developed DBpedia Spotlight, a flexible concept tagging system that is able to annotate entities, topics and other terms in natural language text. The system starts by recognizing phrases to annotate in the input text, and subsequently disambiguates them to a reference knowledge base extracted from Wikipedia. In this paper we evaluate the impact of the phrase recognition step on the ability of the system to correctly reproduce the annotations of a gold standard in an unsupervised setting. We argue that a combination of techniques is needed, and we evaluate a number of alternatives according to an existing evaluation set.