FORTIFY: Generative Model Fine-tuning with ORPO for ReTrieval Expansion of InFormal NoisY Text
Dan DeGenaro, Eugene Yang, David Etter, Cameron Carpenter, Kate Sanders, Alexander Martin, Kenton Murray, Reno Kriz
Abstract
Despite recent advancements in neural retrieval, representing text fragments or phrases with proper contextualized embeddings is still challenging. Particularly in video retrieval, where documents are text extracted through OCR from the frames or ASR from audio tracks, the textual content is rarely complete sentences but only a bag of phrases. In this work, we propose FORTIFY, a generative model fine-tuning approach for noisy document rewriting and summarization, to improve the downstream retrieval effectiveness. By experimenting on MultiVENT 2.0, an informational video retrieval benchmark, we show Llama fine-tuned with FORTIFY provides an effective document expansion, leading to a 30% improvement over prompting an out-of-box Llama model on nDCG@10. Zero-shot transferring the model tailored for MultiVENT 2.0 to two out-of-distribution datasets still demonstrates competitive retrieval effectiveness to other document preprocessing alternatives.- Anthology ID:
- 2025.magmar-1.13
- Volume:
- Proceedings of the 1st Workshop on Multimodal Augmented Generation via Multimodal Retrieval (MAGMaR 2025)
- Month:
- August
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Reno Kriz, Kenton Murray
- Venues:
- MAGMaR | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 100–115
- Language:
- URL:
- https://preview.aclanthology.org/landing_page/2025.magmar-1.13/
- DOI:
- Cite (ACL):
- Dan DeGenaro, Eugene Yang, David Etter, Cameron Carpenter, Kate Sanders, Alexander Martin, Kenton Murray, and Reno Kriz. 2025. FORTIFY: Generative Model Fine-tuning with ORPO for ReTrieval Expansion of InFormal NoisY Text. In Proceedings of the 1st Workshop on Multimodal Augmented Generation via Multimodal Retrieval (MAGMaR 2025), pages 100–115, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- FORTIFY: Generative Model Fine-tuning with ORPO for ReTrieval Expansion of InFormal NoisY Text (DeGenaro et al., MAGMaR 2025)
- PDF:
- https://preview.aclanthology.org/landing_page/2025.magmar-1.13.pdf