Brandon Birmingham


Adding the Third Dimension to Spatial Relation Detection in 2D Images
Brandon Birmingham | Adrian Muscat | Anja Belz
Proceedings of the 11th International Conference on Natural Language Generation

Detection of spatial relations between objects in images is currently a popular subject in image description research. A range of different language and geometric object features have been used in this context, but methods have not so far used explicit information about the third dimension (depth), except when manually added to annotations. The lack of such information hampers detection of spatial relations that are inherently 3D. In this paper, we use a fully automatic method for creating a depth map of an image and derive several different object-level depth features from it which we add to an existing feature set to test the effect on spatial relation detection. We show that performance increases are obtained from adding depth features in all scenarios tested.


pdf bib
The Use of Object Labels and Spatial Prepositions as Keywords in a Web-Retrieval-Based Image Caption Generation System
Brandon Birmingham | Adrian Muscat
Proceedings of the Sixth Workshop on Vision and Language

In this paper, a retrieval-based caption generation system that searches the web for suitable image descriptions is studied. Google’s reverse image search is used to find potentially relevant web multimedia content for query images. Sentences are extracted from web pages and the likelihood of the descriptions is computed to select one sentence from the retrieved text documents. The search mechanism is modified to replace the caption generated by Google with a caption composed of labels and spatial prepositions as part of the query’s text alongside the image. The object labels are obtained using an off-the-shelf R-CNN and a machine learning model is developed to predict the prepositions. The effect on the caption generation system performance when using the generated text is investigated. Both human evaluations and automatic metrics are used to evaluate the retrieved descriptions. Results show that the web-retrieval-based approach performed better when describing single-object images with sentences extracted from stock photography websites. On the other hand, images with two image objects were better described with template-generated sentences composed of object labels and prepositions.


Exploring Different Preposition Sets, Models and Feature Sets in Automatic Generation of Spatial Image Descriptions
Anja Belz | Adrian Muscat | Brandon Birmingham
Proceedings of the 5th Workshop on Vision and Language

Effect of Data Annotation, Feature Selection and Model Choice on Spatial Description Generation in French
Anja Belz | Adrian Muscat | Brandon Birmingham | Jessie Levacher | Julie Pain | Adam Quinquenel
Proceedings of the 9th International Natural Language Generation conference