Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, Marcus Rohrbach
- Anthology ID:
- D16-1044
- Volume:
- Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2016
- Address:
- Austin, Texas
- Editors:
- Jian Su, Kevin Duh, Xavier Carreras
- Venue:
- EMNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 457–468
- Language:
- URL:
- https://aclanthology.org/D16-1044
- DOI:
- 10.18653/v1/D16-1044
- Cite (ACL):
- Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, and Marcus Rohrbach. 2016. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 457–468, Austin, Texas. Association for Computational Linguistics.
- Cite (Informal):
- Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding (Fukui et al., EMNLP 2016)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/D16-1044.pdf
- Code
- akirafukui/vqa-mcb + additional community code
- Data
- Flickr30K Entities, Flickr30k, MS COCO, Visual Genome, Visual Question Answering, Visual Question Answering v2.0, Visual7W