Qiong Liu


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
Walk in Others’ Shoes with a Single Glance: Human-Centric Visual Grounding with Top-View Perspective Transformation
Yuqi Bu | Xin Wu | Zirui Zhao | Yi Cai | David Hsu | Qiong Liu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Visual perspective-taking, an ability to envision others’ perspectives from a single self-perspective, is vital in human-robot interactions. Thus, we introduce a human-centric visual grounding task and a dataset to evaluate this ability. Recent advances in vision-language models (VLMs) have shown potential for inferring others’ perspectives, yet are insensitive to information differences induced by slight perspective changes. To address this problem, we propose a top-view enhanced perspective transformation (TEP) method, which decomposes the transition from robot to human perspectives through an abstract top-view representation. It unifies perspectives and facilitates the capture of information differences from diverse perspectives. Experimental results show that TEP improves performance by up to 18%, exhibits perspective-taking abilities across various perspectives, and generalizes effectively to robotic and dynamic scenarios.

2023

pdf bib
Segment-Level and Category-Oriented Network for Knowledge-Based Referring Expression Comprehension
Yuqi Bu | Xin Wu | Liuwu Li | Yi Cai | Qiong Liu | Qingbao Huang
Findings of the Association for Computational Linguistics: ACL 2023

Knowledge-based referring expression comprehension (KB-REC) aims to identify visual objects referred to by expressions that incorporate knowledge. Existing methods employ sentence-level retrieval and fusion methods, which may lead to issues of similarity bias and interference from irrelevant information in unstructured knowledge sentences. To address these limitations, we propose a segment-level and category-oriented network (SLCO). Our approach includes a segment-level and prompt-based knowledge retrieval method to mitigate the similarity bias problem and a category-based grounding method to alleviate interference from irrelevant information in knowledge sentences. Experimental results show that our SLCO can eliminate interference and improve the overall performance of the KB-REC task.