Bo Feng
2026
Putting Captions to the Test: Evaluating Video Caption Quality through Multiple-Choice Question Answering
Zizhen Wang | Bo Feng | Zhengfeng Lai | Shiyu Li | Yang Lu | Meng Cao | Ping Huang | Xiaoming Simon Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zizhen Wang | Bo Feng | Zhengfeng Lai | Shiyu Li | Yang Lu | Meng Cao | Ping Huang | Xiaoming Simon Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Evaluating video captioning remains a critical challenge for Visual Large Language Models (VLLMs). Existing metrics primarily rely on matching generated text against ground-truth references. This paradigm suffers from the “one-to-many” nature of video description, where high-quality captions are often penalized for lexical mismatches or valid shifts in visual focus. Furthermore, such assessments are typically one-dimensional, failing to provide a fine-grained analysis of caption quality. To address this, we redefine caption quality through the lens of information fidelity: A caption must maximize the coverage of salient visual information while ensuring strict factuality. We introduce CapQuiz, a novel reference-free benchmark that assesses captions based on their utility in answering human-verified, fine-grained, multiple-choice questions derived from the video. CapQuiz features a hierarchical taxonomy of 10 question types (spanning Descriptive and Inferential categories) across 24 diverse video domains. Extensive experiments demonstrate that CapQuiz correlates significantly better with human judgments than existing metrics and offers interpretable insights into model performance. We will release the benchmark to facilitate reproducible research.
2021
CRYPTOGRU: Low Latency Privacy-Preserving Text Analysis With GRU
Bo Feng | Qian Lou | Lei Jiang | Geoffrey Fox
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Bo Feng | Qian Lou | Lei Jiang | Geoffrey Fox
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Homomorphic encryption (HE) and garbled circuit (GC) provide the protection for users’ privacy. However, simply mixing the HE and GC in RNN models suffer from long inference latency due to slow activation functions. In this paper, we present a novel hybrid structure of HE and GC gated recurrent unit (GRU) network, , for low-latency secure inferences. replaces computationally expensive GC-based tanh with fast GC-based ReLU, and then quantizes sigmoid and ReLU to smaller bit-length to accelerate activations in a GRU. We evaluate with multiple GRU models trained on 4 public datasets. Experimental results show achieves top-notch accuracy and improves the secure inference latency by up to 138× over one of the state-of-the-art secure networks on the Penn Treebank dataset.