Mar Domínguez Orfila
2022
CAT ManyNames: A New Dataset for Object Naming in Catalan
Mar Domínguez Orfila
|
Maite Melero Nogués
|
Gemma Boleda Torrent
Proceedings of the Workshop on Cognitive Aspects of the Lexicon
Object Naming is an important task within the field of Language and Vision that consists of generating a correct and appropriate name for an object given an image. The ManyNames dataset uses real-world human annotated images with multiple labels, instead of just one. In this work, we describe the adaptation of this dataset (originally in English) to Catalan, by (i) machine-translating the English labels and (ii) collecting human annotations for a subset of the original corpus and comparing both resources. Analyses reveal divergences in the lexical variation of the two sets showing potential problems of directly translated resources, particularly when there is no resource to a proper context, which in this case is conveyed by the image. The analysis also points to the impact of cultural factors in the naming task, which should be accounted for in future cross-lingual naming tasks.
Search