Shikhar Singh
2023
VIPHY: Probing “Visible” Physical Commonsense Knowledge
Shikhar Singh
|
Ehsan Qasemi
|
Muhao Chen
Findings of the Association for Computational Linguistics: EMNLP 2023
Vision-language models (VLMs) have shown remarkable performance on visual reasoning tasks (e.g. attributes, location). While such tasks measure the requisite knowledge to ground and reason over a given visual instance, they do not, however, measure the ability of VLMs to retain and generalize such knowledge. In this work, we evaluate VLMs’ ability to acquire “visible” physical knowledge – the information that is easily accessible from images of static scenes, particularly along the dimensions of object color, size, and space. We build an automatic pipeline to derive a comprehensive knowledge resource for calibrating and probing these models. Our results indicate a severe gap between model and human performance across all three dimensions. Furthermore, we demonstrate that a caption pretrained LM significantly outperforms VLMs on both size and spatial tasks – highlighting that despite sufficient access to ground language with visual modality, they struggle to retain such knowledge.
2021
COM2SENSE: A Commonsense Reasoning Benchmark with Complementary Sentences
Shikhar Singh
|
Nuan Wen
|
Yu Hou
|
Pegah Alipoormolabashi
|
Te-lin Wu
|
Xuezhe Ma
|
Nanyun Peng
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
EventPlus: A Temporal Event Understanding Pipeline
Mingyu Derek Ma
|
Jiao Sun
|
Mu Yang
|
Kung-Hsiang Huang
|
Nuan Wen
|
Shikhar Singh
|
Rujun Han
|
Nanyun Peng
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations
We present EventPlus, a temporal event understanding pipeline that integrates various state-of-the-art event understanding components including event trigger and type detection, event argument detection, event duration and temporal relation extraction. Event information, especially event temporal knowledge, is a type of common sense knowledge that helps people understand how stories evolve and provides predictive hints for future events. EventPlus as the first comprehensive temporal event understanding pipeline provides a convenient tool for users to quickly obtain annotations about events and their temporal information for any user-provided document. Furthermore, we show EventPlus can be easily adapted to other domains (e.g., biomedical domain). We make EventPlus publicly available to facilitate event-related information extraction and downstream applications.
Search
Co-authors
- Nuan Wen 2
- Nanyun Peng 2
- Yu Hou 1
- Pegah Alipoormolabashi 1
- Te-Lin Wu 1
- show all...