Andreas Herzinger
2026
Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation
Abdelrahman Abdallah | Bhawna Piryani | Jamshid Mozafari | Andreas Herzinger | Jamie Holdcroft | Adam Jatowt
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Abdelrahman Abdallah | Bhawna Piryani | Jamshid Mozafari | Andreas Herzinger | Jamie Holdcroft | Adam Jatowt
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Building retrieval-augmented generation (RAG) systems often requirescombining separate tools for retrieval, re-ranking, and generation,with incompatible data formats, evaluation pipelines, and deployment workflows.We present , an open-source Python toolkit that unifies these stagesin a single modular framework.[PyPI: <https://pypi.org/project/rankify/>],[GitHub: <https://github.com/DataScienceUIBK/Rankify>],[Docs: <https://rankify.readthedocs.io>]%,[Video: <https://youtu.be/kkLzomrM2ec>]provides 42 benchmark datasets with pre-retrieved documents andpre-built indices, 15 retrievers (sparse, dense, and reasoning-augmented),and 24 re-ranking models spanning 41 pointwise, pairwise, and listwise variants.It also supports 6 RAG strategies across four inference backends(Hugging Face, vLLM, LiteLLM, and OpenAI), enabling consistent experimentationfrom local models to hosted APIs.A unified pipeline interface allows users to compose retrieve–rerank–generateworkflows in a few lines of code, while an agentic assistant (RankifyAgent), aREST server (RankifyServer), and an interactive webplayground support deployment and non-programmatic exploration.Across 200+ configurations on QA and BEIR/TREC benchmarks with six generator LLMs,re-ranking consistently improves downstream performance, yielding gains of5–15 points in Exact Match and up to 8.5 points in RAGAS context precisionacross diverse retriever–generator combinations.