Huibin Zhang


2023

pdf bib
Improving Situated Conversational Agents with Step-by-Step Multi-modal Logic Reasoning
Yuxing Long | Huibin Zhang | Binyuan Hui | Zhenglu Yang | Caixia Yuan | Xiaojie Wang | Fei Huang | Yongbin Li
Proceedings of The Eleventh Dialog System Technology Challenge

To fulfill complex user requirements in a situated conversational scenario, the agent needs to conduct step-by-step multi-modal logic reasoning, which includes locating objects, querying information and searching objects. However, existing methods omit this multi-step procedure and therefore constitutes the risk of shortcuts when making predictions. For example, they may directly copy the information from the dialogue history or simply use the textual description without perform visual reasoning. To address this issue and further boost the system performance, we apply the dual process theory to plug a reasoner into the original transformer based model for step-by-step reasoning. When system 2 completes multi-step reasoning, its output is regarded as final prediction. Our proposed method achieved the 1st rank on the summing scores across all four DSTC-11 SIMMC 2.1 sub-tasks.

2022

pdf
Modeling Temporal-Modal Entity Graph for Procedural Multimodal Machine Comprehension
Huibin Zhang | Zhengkun Zhang | Yao Zhang | Jun Wang | Yufan Li | Ning Jiang | Xin Wei | Zhenglu Yang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Procedural Multimodal Documents (PMDs) organize textual instructions and corresponding images step by step. Comprehending PMDs and inducing their representations for the downstream reasoning tasks is designated as Procedural MultiModal Machine Comprehension (M3C). In this study, we approach Procedural M3C at a fine-grained level (compared with existing explorations at a document or sentence level), that is, entity. With delicate consideration, we model entity both in its temporal and cross-modal relation and propose a novel Temporal-Modal Entity Graph (TMEG). Specifically, graph structure is formulated to capture textual and visual entities and trace their temporal-modal evolution. In addition, a graph aggregation module is introduced to conduct graph encoding and reasoning. Comprehensive experiments across three Procedural M3C tasks are conducted on a traditional dataset RecipeQA and our new dataset CraftQA, which can better evaluate the generalization of TMEG.

2010

pdf
Corpus-based Semantic Class Mining: Distributional vs. Pattern-Based Approaches
Shuming Shi | Huibin Zhang | Xiaojie Yuan | Ji-Rong Wen
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2009

pdf
Employing Topic Models for Pattern-based Semantic Class Discovery
Huibin Zhang | Mingjie Zhu | Shuming Shi | Ji-Rong Wen
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP