Ivo Pascal De Jong
Also published as: Ivo Pascal de Jong
2026
Why Large Language Models can Secretly Outperform Embedding Similarity in Information Retrieval
Matei Benescu | Ivo Pascal de Jong
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Matei Benescu | Ivo Pascal de Jong
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
With the emergence of Large Language Models (LLMs), new methods in Information Retrieval are available in which relevance is estimated directly through language understanding and reasoning, instead of embedding similarity. We argue that similarity is a short-sighted interpretation of relevance, and that LLM-Based Relevance Judgment Systems (LLM-RJS) (with reasoning) have potential to outperform Neural Embedding Retrieval Systems (NERS) by overcoming this limitation. Using the TREC-DL 2019 passage retrieval dataset, we compare various LLM-RJS with NERS, but observe no noticeable improvement. Subsequently, we analyze the impact of reasoning by comparing LLM-RJS with and without reasoning. We find that human annotations also suffer from short-sightedness, and that false-positives in the reasoning LLM-RJS are primarily mistakes in annotations due to short-sightedness. We conclude that LLM-RJS do have the ability to address the short-sightedness limitation in NERS, but that this cannot be evaluated with standard annotated relevance datasets.
2025
Know What You do Not Know: Verbalized Uncertainty Estimation Robustness on Corrupted Images in Vision-Language Models
Mirko Borszukovszki | Ivo Pascal De Jong | Matias Valdenegro-Toro
Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025)
Mirko Borszukovszki | Ivo Pascal De Jong | Matias Valdenegro-Toro
Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025)
To leverage the full potential of Large Language Models (LLMs) it is crucial to have some information on their answers’ uncertainty. This means that the model has to be able to quantify how certain it is in the correctness of a given response. Bad uncertainty estimates can lead to overconfident wrong answers undermining trust in these models. Quite a lot of research has been done on language models that work with text inputs and provide text outputs. Still, since the visual capabilities have been added to these models recently, there has not been much progress on the uncertainty of Visual Language Models (VLMs). We tested three state-of-the-art VLMs on corrupted image data. We found that the severity of the corruption negatively impacted the models’ ability to estimate their uncertainty and the models also showed overconfidence in most of the experiments.