Pavle Marković


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2024

pdf bib
Dial BeInfo for Faithfulness: Improving Factuality of Information-Seeking Dialogue via Behavioural Fine-Tuning
Evgeniia Razumovskaia | Ivan Vulić | Pavle Marković | Tomasz Cichy | Qian Zheng | Tsung-Hsien Wen | Paweł Budzianowski
Findings of the Association for Computational Linguistics: EMNLP 2024

Factual faithfulness is a crucial requirement in information-seeking dialogue: the system should respond to the user queries so that the responses are meaningful and aligned with the knowledge provided to the system. However, most modern large language models (LLMs) suffer from hallucinations, that is, they generate responses not supported by or even contradicting the knowledge source. To mitigate the issue and increase faithfulness of information-seeking dialogue systems supported by the LLMs, we introduce BeInfo, a simple yet effective method that applies ‘behavioural tuning’ on the LLMs to aid information-seeking dialogue. Relying on three standard information seeking dialogue datasets, we show that models tuned with BeInfo become considerably more faithful to the knowledge source both for datasets and domains seen during BeInfo-tuning, as well as on unseen domains, when applied in a zero-shot manner. In addition, we present a ‘real-life’ case study on conversations with real users, showcasing that the models with 3B parameters (e.g., Flan-T5) tuned with BeInfo demonstrate strong performance on data from real ‘production’ conversations: when tuned on a limited amount of such realistic in-domain dialogues, they surpass much larger LLMs used ‘off-the-shelf’, both on automatic and human evaluation metrics.