@inproceedings{hsu-etal-2023-visually,
    title = "Visually-Enhanced Phrase Understanding",
    author = "Hsu, Tsu-Yuan  and
      Li, Chen-An  and
      Huang, Chao-Wei  and
      Chen, Yun-Nung",
    editor = "Rogers, Anna  and
      Boyd-Graber, Jordan  and
      Okazaki, Naoaki",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2023.findings-acl.363/",
    doi = "10.18653/v1/2023.findings-acl.363",
    pages = "5879--5888",
    abstract = "Large-scale vision-language pre-training has exhibited strong performance in various visual and textual understanding tasks. Recently, the textual encoders of multi-modal pre-trained models have been shown to generate high-quality textual representations, which often outperform models that are purely text-based, such as BERT. In this study, our objective is to utilize both textual and visual encoders of multi-modal pre-trained models to enhance language understanding tasks. We achieve this by generating an image associated with a textual prompt, thus enriching the representation of a phrase for downstream tasks. Results from experiments conducted on four benchmark datasets demonstrate that our proposed method, which leverages visually-enhanced text representations, significantly improves performance in the entity clustering task."
}Markdown (Informal)
[Visually-Enhanced Phrase Understanding](https://preview.aclanthology.org/ingest-emnlp/2023.findings-acl.363/) (Hsu et al., Findings 2023)
ACL
- Tsu-Yuan Hsu, Chen-An Li, Chao-Wei Huang, and Yun-Nung Chen. 2023. Visually-Enhanced Phrase Understanding. In Findings of the Association for Computational Linguistics: ACL 2023, pages 5879–5888, Toronto, Canada. Association for Computational Linguistics.