David Bayani
2024
Testing the Depth of ChatGPT’s Comprehension via Cross-Modal Tasks Based on ASCII-Art: GPT3.5’s Abilities in Regard to Recognizing and Generating ASCII-Art Are Not Totally Lacking
David Bayani
Findings of the Association for Computational Linguistics: EACL 2024
In the months since its release, ChatGPT and its underlying model, GPT3.5, have garnered massive attention, due to their potent mix of capability and accessibility. While a niche industry of papers have emerged examining the scope of capabilities these models possess, language — whether natural or stylized like code — has been the vehicle to exchange information with the network. Drawing inspiration from the multi-modal knowledge we’d expect an agent with true understanding to possess, we examine GPT3.5’s aptitude for visual tasks, where the inputs feature ASCII-art without overt distillation into a lingual summary. In particular, we scrutinize its performance on carefully designed image recognition and generation tasks. An extended version of this write-up is available at: https://arxiv.org/abs/2307.16806 .