2024
pdf
abs
Cross-Modal Projection in Multimodal LLMs Doesn’t Really Project Visual Attributes to Textual Space
Gaurav Verma
|
Minje Choi
|
Kartik Sharma
|
Jamelle Watson-Daniels
|
Sejoon Oh
|
Srijan Kumar
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Multimodal large language models (MLLMs) like LLaVA and GPT-4(V) enable general-purpose conversations about images with the language modality. As off-the-shelf MLLMs may have limited capabilities on images from domains like dermatology and agriculture, they must be fine-tuned to unlock domain-specific applications. The prevalent architecture of current open-source MLLMs comprises two major modules: an image-language (cross-modal) projection network and a large language model. It is desirable to understand the roles of these two modules in modeling domain-specific visual attributes to inform the design of future models and streamline the interpretability efforts on the current models. To this end, via experiments on 4 datasets and under 2 fine-tuning settings, we find that as the MLLM is fine-tuned, it indeed gains domain-specific visual capabilities, but the updates do not lead to the projection extracting relevant domain-specific visual attributes. Our results indicate that the domain-specific visual attributes are modeled by the LLM, even when only the projection is fine-tuned. Through this study, we offer a potential reinterpretation of the role of cross-modal projections in MLLM architectures.
2021
pdf
abs
Clause Final Verb Prediction in Hindi: Evidence for Noisy Channel Model of Communication
Kartik Sharma
|
Niyati Bafna
|
Samar Husain
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Verbal prediction has been shown to be critical during online comprehension of Subject-Object-Verb (SOV) languages. In this work we present three computational models to predict clause final verbs in Hindi given its prior arguments. The models differ in their use of prior context during the prediction process – the context is either noisy or noise-free. Model predictions are compared with the sentence completion data obtained from Hindi native speakers. Results show that models that assume noisy context outperform the noise-free model. In particular, a lossy context model that assumes prior context to be affected by predictability and recency captures the distribution of the predicted verb class and error sources best. The success of the predictability-recency lossy context model is consistent with the noisy channel hypothesis for sentence comprehension and supports the idea that the reconstruction of the context during prediction is driven by prior linguistic exposure. These results also shed light on the nature of the noise that affects the reconstruction process. Overall the results pose a challenge to the adaptability hypothesis that assumes use of noise-free preverbal context for robust verbal prediction.
2020
pdf
bib
abs
What Determines the Order of Verbal Dependents in Hindi? Effects of Efficiency in Comprehension and Production
Kartik Sharma
|
Richard Futrell
|
Samar Husain
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Word order flexibility is one of the distinctive features of SOV languages. In this work, we investigate whether the order and relative distance of preverbal dependents in Hindi, an SOV language, is affected by factors motivated by efficiency considerations during comprehension/production. We investigate the influence of Head–Dependent Mutual Information (HDMI), similarity-based interference, accessibility and case-marking. Results show that preverbal dependents remain close to the verbal head when the HDMI between the verb and its dependent is high. This demonstrates the influence of locality constraints on dependency distance and word order in an SOV language. Additionally, dependency distance were found to be longer when the dependent was animate, when it was case-marked and when it was semantically similar to other preverbal dependents. Together the results highlight the crosslinguistic generalizability of these factors and provide evidence for a functionally motivated account of word order in SOV languages such as Hindi.
2019
pdf
Can Greenbergian universals be induced from language networks?
Kartik Sharma
|
Kaivalya Swami
|
Aditya Shete
|
Samar Husain
Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019)