Yulong Ji


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
RAV: Retrieval-Augmented Voting for Tactile Descriptions Without Training
Jinlin Wang | Yulong Ji | Hongyu Yang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Tactile perception is essential for human-environment interaction, and deriving tactile descriptions from multimodal data is a key challenge for embodied intelligence to understand human perception. Conventional approaches relying on extensive parameter learning for multimodal perception are rigid and computationally inefficient. To address this, we introduce Retrieval-Augmented Voting (RAV), a parameter-free method that constructs visual-tactile cross-modal knowledge directly. RAV retrieves similar visual-tactile data for given visual and tactile inputs and generates tactile descriptions through a voting mechanism. In experiments, we applied three voting strategies, SyncVote, DualVote and WeightVote, achieving performance comparable to large-scale cross-modal models without training. Comparative experiments across datasets of varying quality—defined by annotation accuracy and data diversity—demonstrate that RAV’s performance improves with higher-quality data at no additional computational cost. Code, and model checkpoints are opensourced at https://github.com/PluteW/RAV.