Sedigheh Eslami
2025
jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval
Michael Günther
|
Saba Sturua
|
Mohammad Kalim Akram
|
Isabelle Mohr
|
Andrei Ungureanu
|
Bo Wang
|
Sedigheh Eslami
|
Scott Martens
|
Maximilian Werk
|
Nan Wang
|
Han Xiao
Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025)
We introduce jina-embeddings-v4, a 3.8 billion parameter embedding model that unifies text and image representations, with a novel architecture supporting both single-vector and multi-vector embeddings. It achieves high performance on both single-modal and cross-modal retrieval tasks, and is particularly strong in processing visually rich content such as tables, charts, diagrams, and mixed-media formats that incorporate both image and textual information. We also introduce JVDR, a novel benchmark for visually rich document retrieval that includes more diverse materials and query types than previous efforts. We use JVDR to show that jina-embeddings-v4 greatly improves on state-of-the-art performance for these kinds of tasks.
2023
PubMedCLIP: How Much Does CLIP Benefit Visual Question Answering in the Medical Domain?
Sedigheh Eslami
|
Christoph Meinel
|
Gerard de Melo
Findings of the Association for Computational Linguistics: EACL 2023
Contrastive Language–Image Pre-training (CLIP) has shown remarkable success in learning with cross-modal supervision from extensive amounts of image–text pairs collected online. Thus far, the effectiveness of CLIP has been investigated primarily in general-domain multimodal problems. In this work, we evaluate the effectiveness of CLIP for the task of Medical Visual Question Answering (MedVQA). We present PubMedCLIP, a fine-tuned version of CLIP for the medical domain based on PubMed articles. Our experiments conducted on two MedVQA benchmark datasets illustrate that PubMedCLIP achieves superior results improving the overall accuracy up to 3% in comparison to the state-of-the-art Model-Agnostic Meta-Learning (MAML) networks pre-trained only on visual data. The PubMedCLIP model with different back-ends, the source code for pre-training them and reproducing our MedVQA pipeline is publicly available at https://github.com/sarahESL/PubMedCLIP.
Search
Fix author
Co-authors
- Mohammad Kalim Akram 1
- Gerard De Melo 1
- Michael Günther 1
- Scott Martens 1
- Christoph Meinel 1
- show all...