Hu Cao
2026
CCTVBench: Contrastive Consistency Traffic VideoQA Benchmark for Multimodal LLMs
Xingcheng Zhou | Hao Guo | Rui Song | Walter Zimmer | Mingyu Liu | Andr\'e Schamschurko | Hu Cao | Alois Knoll
Findings of the Association for Computational Linguistics: ACL 2026
Xingcheng Zhou | Hao Guo | Rui Song | Walter Zimmer | Mingyu Liu | Andr\'e Schamschurko | Hu Cao | Alois Knoll
Findings of the Association for Computational Linguistics: ACL 2026
Safety-critical traffic reasoning requires contrastive consistency: models must detect true hazards when an accident occurs, and reliably reject plausible-but-false hypotheses under near-identical counterfactual scenes. We present CCTVBench, a Contrastive Consistency Traffic VideoQA Benchmark built on paired real accident videos and world-model-generated counterfactual counterparts, together with minimally different, mutually exclusive hypothesis questions. CCTVBench enforces a single structured decision pattern over each video question quadruple and provides actionable diagnostics that decompose failures into positive omission, positive swap, negative hallucination, and mutual-exclusivity violation, while separating video versus question consistency. Experiments across open-source and proprietary video LLMs reveal a large and persistent gap between standard per-instance QA metrics and quadruple-level contrastive consistency, with unreliable none-of-the-above rejection as a key bottleneck. Finally, we introduce C-TCD, which leverages the semantically exclusive counterpart video as the contrast input at inference time, improving both instance-level QA and contrastive consistency.
2022
OneEE: A One-Stage Framework for Fast Overlapping and Nested Event Extraction
Hu Cao | Jingye Li | Fangfang Su | Fei Li | Hao Fei | Shengqiong Wu | Bobo Li | Liang Zhao | Donghong Ji
Proceedings of the 29th International Conference on Computational Linguistics
Hu Cao | Jingye Li | Fangfang Su | Fei Li | Hao Fei | Shengqiong Wu | Bobo Li | Liang Zhao | Donghong Ji
Proceedings of the 29th International Conference on Computational Linguistics
Event extraction (EE) is an essential task of information extraction, which aims to extract structured event information from unstructured text. Most prior work focuses on extracting flat events while neglecting overlapped or nested ones. A few models for overlapped and nested EE includes several successive stages to extract event triggers and arguments,which suffer from error propagation. Therefore, we design a simple yet effective tagging scheme and model to formulate EE as word-word relation recognition, called OneEE. The relations between trigger or argument words are simultaneously recognized in one stage with parallel grid tagging, thus yielding a very fast event extraction speed. The model is equipped with an adaptive event fusion module to generate event-aware representations and a distance-aware predictor to integrate relative distance information for word-word relation recognition, which are empirically demonstrated to be effective mechanisms. Experiments on 3 overlapped and nested EE benchmarks, namely FewFC, Genia11, and Genia13, show that OneEE achieves the state-of-the-art (SOTA) results. Moreover, the inference speed of OneEE is faster than those of baselines in the same condition, and can be further substantially improved since it supports parallel inference.