Xuan Lu
2026
From Adoption to Adaptation: Tracing the Diffusion of New Emojis on Twitter
Yuhang Zhou | Xuan Lu | Wei Ai
Proceedings of the Seventh Workshop on Natural Language Processing and Computational Social Science
Yuhang Zhou | Xuan Lu | Wei Ai
Proceedings of the Seventh Workshop on Natural Language Processing and Computational Social Science
The frequent introduction of new emojis in each Unicode release creates a dynamic shift in social media content, providing a unique opportunity to explore the evolution of digital language. Analyzing a large dataset of sampled English tweets, we examine how newly released emojis gain popularity and evolve in meaning. We find that the community size of early adopters and emoji semantics are positively correlated with their popularity. Certain emojis experienced notable shifts in the meanings and sentiment associations during the diffusion process. Additionally, we propose a novel framework utilizing language models to extract words and pre-existing emojis with semantically similar contexts, which enhances the interpretation of new emojis. The framework demonstrates its effectiveness in improving downstream text classification performance by substituting unknown new emojis with familiar ones. This study offers a new perspective in understanding how new language units are adopted, adapted, and integrated into the fabric of online communication.
2025
MultiConIR: Towards Multi-Condition Information Retrieval
Xuan Lu | Sifan Liu | Bochao Yin | Yongqi Li | Xinghao Chen | Hui Su | Yaohui Jin | Wenjun Zeng | Xiaoyu Shen
Findings of the Association for Computational Linguistics: EMNLP 2025
Xuan Lu | Sifan Liu | Bochao Yin | Yongqi Li | Xinghao Chen | Hui Su | Yaohui Jin | Wenjun Zeng | Xiaoyu Shen
Findings of the Association for Computational Linguistics: EMNLP 2025
Multi-condition information retrieval (IR) presents a significant, yet underexplored challenge for existing systems. This paper introduces MultiConIR, the first benchmark specifically designed to evaluate retrieval and reranking models under nuanced multi-condition query scenarios across five diverse domains. We systematically assess model capabilities through three critical tasks: complexity robustness, relevance monotonicity, and query format sensitivity. Our extensive experiments on 15 models reveal a critical vulnerability: most retrievers and rerankers exhibit severe performance degradation as query complexity increases. Key deficiencies include widespread failure to maintain relevance monotonicity, and high sensitivity to query style and condition placement. The superior performance GPT-4o reveals the performance gap between IR systems and advanced LLM for handling sophisticated natural language queries. Furthermore, this work delves into the factors contributing to reranker performance deterioration and examines how condition positioning within queries affects similarity assessment, providing crucial insights for advancing IR systems towards complex search scenarios.
2024
InfoEnh: Towards Multimodal Sentiment Analysis via Information Bottleneck Filter and Optimal Transport Alignment
Yifeng Xie | Zhihong Zhu | Xuan Lu | Zhiqi Huang | Haoran Xiong
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Yifeng Xie | Zhihong Zhu | Xuan Lu | Zhiqi Huang | Haoran Xiong
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
In recent years, Multimodal Sentiment Analysis (MSA) leveraging deep learning has demonstrated exceptional performance in a wide range of domains. Its success lies in effectively utilizing information from multiple modalities to analyze sentiments. Despite these advancements, MSA is confronted with two significant challenges. Firstly, each modality often has a surplus of unimportance data, which can overshadow the essential information. Secondly, the crucial cues for sentiment analysis may conflict across different modalities, thereby complicating the analysis process. These issues have a certain impact on the model’s effectiveness in MSA tasks. To address these challenges, this paper introduces a novel method tailored for MSA, termed InfoEnh. This approach utilizes a masking technique as the bottleneck for information filtering, simultaneously maximizing mutual information to retain crucial data. Furthermore, the method integrates all modalities into a common feature space via domain adaptation, which is enhanced by the application of optimal transport. Extensive experiments conducted on two benchmark MSA datasets demonstrate the effectiveness of our proposed approach. Further analyzes indicate significant improvements over the baselines.