Nazli Ikizler-Cinbis

2018

pdf abs
RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes
Semih Yagcioglu | Aykut Erdem | Erkut Erdem | Nazli Ikizler-Cinbis
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Understanding and reasoning about cooking recipes is a fruitful research direction towards enabling machines to interpret procedural text. In this work, we introduce RecipeQA, a dataset for multimodal comprehension of cooking recipes. It comprises of approximately 20K instructional recipes with multiple modalities such as titles, descriptions and aligned set of images. With over 36K automatically generated question-answer pairs, we design a set of comprehension and reasoning tasks that require joint understanding of images and text, capturing the temporal flow of events and making sense of procedural knowledge. Our preliminary results indicate that RecipeQA will serve as a challenging test bed and an ideal benchmark for evaluating machine comprehension systems. The data and leaderboard are available at http://hucvl.github.io/recipeqa.

2017

pdf abs
Re-evaluating Automatic Metrics for Image Captioning
Mert Kilickaya | Aykut Erdem | Nazli Ikizler-Cinbis | Erkut Erdem
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

The task of generating natural language descriptions from images has received a lot of attention in recent years. Consequently, it is becoming increasingly important to evaluate such image captioning approaches in an automatic manner. In this paper, we provide an in-depth evaluation of the existing image captioning metrics through a series of carefully designed experiments. Moreover, we explore the utilization of the recently proposed Word Mover’s Distance (WMD) document metric for the purpose of image captioning. Our findings outline the differences and/or similarities between metrics and their relative robustness by means of extensive correlation, accuracy and distraction based evaluations. Our results also demonstrate that WMD provides strong advantages over other metrics.

2016

pdf
Leveraging Captions in the Wild to Improve Object Detection
Mert Kilickaya | Nazli Ikizler-Cinbis | Erkut Erdem | Aykut Erdem
Proceedings of the 5th Workshop on Vision and Language