Hsiu-Yu Yang


2022

pdf
Are Visual-Linguistic Models Commonsense Knowledge Bases?
Hsiu-Yu Yang | Carina Silberer
Proceedings of the 29th International Conference on Computational Linguistics

Despite the recent success of pretrained language models as on-the-fly knowledge sources for various downstream tasks, they are shown to inadequately represent trivial common facts that vision typically captures. This limits their application to natural language understanding tasks that require commonsense knowledge. We seek to determine the capability of pretrained visual-linguistic models as knowledge sources on demand. To this end, we systematically compare language-only and visual-linguistic models in a zero-shot commonsense question answering inference task. We find that visual-linguistic models are highly promising regarding their benefit for text-only tasks on certain types of commonsense knowledge associated with the visual world. Surprisingly, this knowledge can be activated even when no visual input is given during inference, suggesting an effective multimodal fusion during pretraining. However, we reveal that there is still a huge space for improvement towards better cross-modal reasoning abilities and pretraining strategies for event understanding.

2021

pdf
Does External Knowledge Help Explainable Natural Language Inference? Automatic Evaluation vs. Human Ratings
Hendrik Schuff | Hsiu-Yu Yang | Heike Adel | Ngoc Thang Vu
Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Natural language inference (NLI) requires models to learn and apply commonsense knowledge. These reasoning abilities are particularly important for explainable NLI systems that generate a natural language explanation in addition to their label prediction. The integration of external knowledge has been shown to improve NLI systems, here we investigate whether it can also improve their explanation capabilities. For this, we investigate different sources of external knowledge and evaluate the performance of our models on in-domain data as well as on special transfer datasets that are designed to assess fine-grained reasoning capabilities. We find that different sources of knowledge have a different effect on reasoning abilities, for example, implicit knowledge stored in language models can hinder reasoning on numbers and negations. Finally, we conduct the largest and most fine-grained explainable NLI crowdsourcing study to date. It reveals that even large differences in automatic performance scores do neither reflect in human ratings of label, explanation, commonsense nor grammar correctness.