Abstract
People can refer to quantities in a visual scene by using either exact cardinals (e.g. one, two, three) or natural language quantifiers (e.g. few, most, all). In humans, these two processes underlie fairly different cognitive and neural mechanisms. Inspired by this evidence, the present study proposes two models for learning the objective meaning of cardinals and quantifiers from visual scenes containing multiple objects. We show that a model capitalizing on a ‘fuzzy’ measure of similarity is effective for learning quantifiers, whereas the learning of exact cardinals is better accomplished when information about number is provided.- Anthology ID:
- E17-2054
- Volume:
- Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
- Month:
- April
- Year:
- 2017
- Address:
- Valencia, Spain
- Editors:
- Mirella Lapata, Phil Blunsom, Alexander Koller
- Venue:
- EACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 337–342
- Language:
- URL:
- https://aclanthology.org/E17-2054
- DOI:
- Cite (ACL):
- Sandro Pezzelle, Marco Marelli, and Raffaella Bernardi. 2017. Be Precise or Fuzzy: Learning the Meaning of Cardinals and Quantifiers from Vision. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 337–342, Valencia, Spain. Association for Computational Linguistics.
- Cite (Informal):
- Be Precise or Fuzzy: Learning the Meaning of Cardinals and Quantifiers from Vision (Pezzelle et al., EACL 2017)
- PDF:
- https://preview.aclanthology.org/add_acl24_videos/E17-2054.pdf
- Data
- ImageNet