Xinyu Liu


2024

pdf
Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-Context Models
Xinyu Liu | Runsong Zhao | Pengcheng Huang | Chunyang Xiao | Bei Li | Jingang Wang | Tong Xiao | JingBo Zhu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Numerous recent works target to extend effective context length for language models and various methods, tasks and benchmarks exist to measure model’s effective memory length. However, through thorough investigations, we find limitations for currently existing evaluations on model’s memory. We provide an extensive survey for limitations in this work and propose a new method called forgetting curve to measure the memorization capability of long-context models. We show that forgetting curve has the advantage of being robust to the tested corpus and the experimental settings, of not relying on prompt and can be applied to any model size. We apply our forgetting curve to a large variety of models involving both transformer and RNN/SSM based architectures. Our measurement provides empirical evidence for the effectiveness of transformer extension techniques while raises questions for the effective length of RNN/SSM based models. We also examine the difference between our measurement and existing benchmarks as well as popular metrics for various models.

2023

pdf
ESPVR: Entity Spans Position Visual Regions for Multimodal Named Entity Recognition
Xiujiao Li | Guanglu Sun | Xinyu Liu
Findings of the Association for Computational Linguistics: EMNLP 2023

Multimodal Named Entity Recognition (MNER) uses visual information to improve the performance of text-only Named Entity Recognition (NER). However, existing methods for acquiring local visual information suffer from certain limitations: (1) using an attention-based method to extract visual regions related to the text from visual regions obtained through convolutional architectures (e.g., ResNet), attention is distracted by the entire image, rather than being fully focused on the visual regions most relevant to the text; (2) using an object detection-based (e.g., Mask R-CNN) method to detect visual object regions related to the text, object detection has a limited range of recognition categories. Moreover, the visual regions obtained by object detection may not correspond to the entities in the text. In summary, the goal of these methods is not to extract the most relevant visual regions for the entities in the text. The visual regions obtained by these methods may be redundant or insufficient for the entities in the text. In this paper, we propose an Entity Spans Position Visual Regions (ESPVR) module to obtain the most relevant visual regions corresponding to the entities in the text. Experiments show that our proposed approach can achieve the SOTA on Twitter-2017 and competitive results on Twitter-2015.

2020

pdf
Showing Your Work Doesn’t Always Work
Raphael Tang | Jaejun Lee | Ji Xin | Xinyu Liu | Yaoliang Yu | Jimmy Lin
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In natural language processing, a recently popular line of work explores how to best report the experimental results of neural networks. One exemplar publication, titled “Show Your Work: Improved Reporting of Experimental Results” (Dodge et al., 2019), advocates for reporting the expected validation effectiveness of the best-tuned model, with respect to the computational budget. In the present work, we critically examine this paper. As far as statistical generalizability is concerned, we find unspoken pitfalls and caveats with this approach. We analytically show that their estimator is biased and uses error-prone assumptions. We find that the estimator favors negative errors and yields poor bootstrapped confidence intervals. We derive an unbiased alternative and bolster our claims with empirical evidence from statistical simulation. Our codebase is at https://github.com/castorini/meanmax.

2019

pdf
jhan014 at SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media
Jiahui Han | Shengtan Wu | Xinyu Liu
Proceedings of the 13th International Workshop on Semantic Evaluation

In this paper, we present two methods to identify and categorize the offensive language in Twitter. In the first method, we establish a probabilistic model to evaluate the sentence offensiveness level and target level according to different sub-tasks. In the second method, we develop a deep neural network consisting of bidirectional recurrent layers with Gated Recurrent Unit (GRU) cells and fully connected layers. In the comparison of two methods, we find both method has its own advantages and drawbacks while they have similar accuracy.