Shaohua Yang


Commonsense Justification for Action Explanation
Shaohua Yang | Qiaozi Gao | Sari Sadiya | Joyce Chai
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

To enable collaboration and communication between humans and agents, this paper investigates learning to acquire commonsense evidence for action justification. In particular, we have developed an approach based on the generative Conditional Variational Autoencoder(CVAE) that models object relations/attributes of the world as latent variables and jointly learns a performer that predicts actions and an explainer that gathers commonsense evidence to justify the action. Our empirical results have shown that, compared to a typical attention-based model, CVAE achieves significantly higher performance in both action prediction and justification. A human subject study further shows that the commonsense evidence gathered by CVAE can be communicated to humans to achieve a significantly higher common ground between humans and agents.

What Action Causes This? Towards Naive Physical Action-Effect Prediction
Qiaozi Gao | Shaohua Yang | Joyce Chai | Lucy Vanderwende
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Despite recent advances in knowledge representation, automated reasoning, and machine learning, artificial agents still lack the ability to understand basic action-effect relations regarding the physical world, for example, the action of cutting a cucumber most likely leads to the state where the cucumber is broken apart into smaller pieces. If artificial agents (e.g., robots) ever become our partners in joint tasks, it is critical to empower them with such action-effect understanding so that they can reason about the state of the world and plan for actions. Towards this goal, this paper introduces a new task on naive physical action-effect prediction, which addresses the relations between concrete actions (expressed in the form of verb-noun pairs) and their effects on the state of the physical world as depicted by images. We collected a dataset for this task and developed an approach that harnesses web image data through distant supervision to facilitate learning for action-effect prediction. Our empirical results have shown that web data can be used to complement a small number of seed examples (e.g., three examples for each action) for model learning. This opens up possibilities for agents to learn physical action-effect relations for tasks at hand through communication with humans with a few examples.


Jointly Learning Grounded Task Structures from Language Instruction and Visual Demonstration
Changsong Liu | Shaohua Yang | Sari Saba-Sadiya | Nishant Shukla | Yunzhong He | Song-Chun Zhu | Joyce Chai
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

Physical Causality of Action Verbs in Grounded Language Understanding
Qiaozi Gao | Malcolm Doering | Shaohua Yang | Joyce Chai
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Grounded Semantic Role Labeling
Shaohua Yang | Qiaozi Gao | Changsong Liu | Caiming Xiong | Song-Chun Zhu | Joyce Y. Chai
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies


Back to the Blocks World: Learning New Actions through Situated Human-Robot Dialogue
Lanbo She | Shaohua Yang | Yu Cheng | Yunyi Jia | Joyce Chai | Ning Xi
Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)


Spell Checking for Chinese
Shaohua Yang | Hai Zhao | Xiaolin Wang | Bao-liang Lu
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents some novel results on Chinese spell checking. In this paper, a concise algorithm based on minimized-path segmentation is proposed to reduce the cost and suit the needs of current Chinese input systems. The proposed algorithm is actually derived from a simple assumption that spelling errors often make the number of segments larger. The experimental results are quite positive and implicitly verify the effectiveness of the proposed assumption. Finally, all approaches work together to output a result much better than the baseline with 12% performance improvement.

Towards a Semantic Annotation of English Television News - Building and Evaluating a Constraint Grammar FrameNet
Shaohua Yang | Hai Zhao | Bao-liang Lu
Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation