João Vitor Robiatti Amorim

2025

pdf bib abs
Analysis of Automated Document Relevance Annotation for Information Retrieval in Oil and Gas Industry
João Vitor Mariano Correia | Murilo Missano Bell | João Vitor Robiatti Amorim | Jonas Queiroz | Daniel Pedronette | Ivan Rizzo Guilherme | Felipe Lima de Oliveira
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

The lack of high-quality test collections challenges Information Retrieval (IR) in specialized domains. This work addresses this issue by comparing supervised classifiers against zero-shot Large Language Models (LLMs) for automated relevance annotation in the oil and gas industry, using human expert judgments as a benchmark. A supervised classifier, trained on limited expert data, outperforms LLMs, achieving an F1-score that surpasses even a second human annotator. The study also empirically confirms that LLMs are susceptible to unfairly prefer technologically similar retrieval systems. While LLMs lack precision in this context, a well-engineered classifier offers an accurate and practical path to scaling evaluation datasets within a human-in-the-loop framework that empowers, not replaces, human expertise.

Co-authors

Jonas Queiroz 1

Venues

emnlp1

Fix data