Abstract
Humans constantly deal with multimodal information, that is, data from different modalities, such as texts and images. In order for machines to process information similarly to humans, they must be able to process multimodal data and understand the joint relationship between these modalities. This paper describes the work performed on the VTLM (Visual Translation Language Modelling) framework from (Caglayan et al., 2021) to test its generalization ability for other language pairs and corpora. We use the multimodal and multilingual corpus How2 (Sanabria et al., 2018) in three parallel streams with aligned English-Portuguese-Visual information to investigate the effectiveness of the model for this new language pair and in more complex scenarios, where the sentence associated with each image is not a simple description of it. Our experiments on the Portuguese-English multimodal translation task using the How2 dataset demonstrate the efficacy of cross-lingual visual pretraining. We achieved a BLEU score of 51.8 and a METEOR score of 78.0 on the test set, outperforming the MMT baseline by about 14 BLEU and 14 METEOR. The good BLEU and METEOR values obtained for this new language pair, regarding the original English-German VTLM, establish the suitability of the model to other languages.- Anthology ID:
- 2022.lrec-1.97
- Volume:
- Proceedings of the Thirteenth Language Resources and Evaluation Conference
- Month:
- June
- Year:
- 2022
- Address:
- Marseille, France
- Editors:
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 919–927
- Language:
- URL:
- https://aclanthology.org/2022.lrec-1.97
- DOI:
- Cite (ACL):
- Júlia Sato, Helena Caseli, and Lucia Specia. 2022. Multilingual and Multimodal Learning for Brazilian Portuguese. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 919–927, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Multilingual and Multimodal Learning for Brazilian Portuguese (Sato et al., LREC 2022)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/2022.lrec-1.97.pdf
- Data
- How2