Xiao Li


2023

pdf
Dynamic Low-rank Estimation for Transformer-based Language Models
Ting Hua | Xiao Li | Shangqian Gao | Yen-Chang Hsu | Yilin Shen | Hongxia Jin
Findings of the Association for Computational Linguistics: EMNLP 2023

Matrix decomposition methods, such as Singular Value Decomposition (SVD) and its importance-weighted variants, have been widely used for compressing Transformer-based language models. While importance-weighted decomposition methods alleviate the strong assumption of equal importance for each parameter in SVD, they still rely on two fundamental assumptions: 1) unchanged importance distribution during further fine-tuning, 2) equal importance across weight matrices in different layers. Furthermore, these methods necessitate a well-trained task-specific model as the starting point and require additional fine-tuning after compression. In this work, we proposed RankDyna, a matrix decomposition method that enables dynamic rank resource allocation among matrices across different layers during the training process. Starting from a general pre-trained model, RankDyna accomplishes the dual goals of compression and adaptation to the downstream task, all within a single round of fine-tuning. The extensive evaluations demonstrate that RankDyna can outperform current SOTA methods under various parameter budget levels, and the advantage of RankDyna is further enhanced with higher compression rates.

pdf
Samsung Research China - Beijing at SemEval-2023 Task 2: An AL-R Model for Multilingual Complex Named Entity Recognition
Haojie Zhang | Xiao Li | Renhua Gu | Xiaoyan Qu | Xiangfeng Meng | Shuo Hu | Song Liu
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper describes our system for SemEval-2023 Task 2 Multilingual Complex Named EntityRecognition (MultiCoNER II). Our teamSamsung Research China - Beijing proposesan AL-R (Adjustable Loss RoBERTa) model toboost the performance of recognizing short andcomplex entities with the challenges of longtaildata distribution, out of knowledge base andnoise scenarios. We first employ an adjustabledice loss optimization objective to overcomethe issue of long-tail data distribution, which isalso proved to be noise-robusted, especially incombatting the issue of fine-grained label confusing. Besides, we develop our own knowledgeenhancement tool to provide related contextsfor the short context setting and addressthe issue of out of knowledge base. Experimentshave verified the validation of our approaches.

pdf
Guide the Many-to-One Assignment: Open Information Extraction via IoU-aware Optimal Transport
Kaiwen Wei | Yiran Yang | Li Jin | Xian Sun | Zequn Zhang | Jingyuan Zhang | Xiao Li | Linhao Zhang | Jintao Liu | Guo Zhi
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Open Information Extraction (OIE) seeks to extract structured information from raw text without the limitations of close ontology. Recently, the detection-based OIE methods have received great attention from the community due to their parallelism. However, as the essential step of those models, how to assign ground truth labels to the parallelly generated tuple proposals remains under-exploited. The commonly utilized Hungarian algorithm for this procedure is restricted to handling one-to-one assignment among the desired tuples and tuple proposals, which ignores the correlation between proposals and affects the recall of the models. To solve this problem, we propose a dynamic many-to-one label assignment strategy named IOT. Concretely, the label assignment process in OIE is formulated as an Optimal Transport (OT) problem. We leverage the intersection-over-union (IoU) as the assignment quality measurement, and convert the problem of finding the best assignment solution to the one of solving the optimal transport plan by maximizing the IoU values. To further utilize the knowledge from the assignment, we design an Assignment-guided Multi-granularity loss (AM) by simultaneously considering word-level and tuple-level information. Experiment results show the proposed method outperforms the state-of-the-art models on three benchmarks.

pdf
MetaPro Online: A Computational Metaphor Processing Online System
Rui Mao | Xiao Li | Kai He | Mengshi Ge | Erik Cambria
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

Metaphoric expressions are a special linguistic phenomenon, frequently appearing in everyday language. Metaphors do not take their literal meanings in contexts, which may cause obstacles for language learners to understand them. Metaphoric expressions also reflect the cognition of humans via concept mappings, attracting great attention from cognitive science and psychology communities. Thus, we aim to develop a computational metaphor processing online system, termed MetaPro Online, that allows users without a coding background, e.g., language learners and linguists, to easily query metaphoricity labels, metaphor paraphrases, and concept mappings for non-domain-specific text. The outputs of MetaPro can be directly used by language learners and natural language processing downstream tasks because MetaPro is an end-to-end system.

2022

pdf
AdaLoGN: Adaptive Logic Graph Network for Reasoning-Based Machine Reading Comprehension
Xiao Li | Gong Cheng | Ziheng Chen | Yawei Sun | Yuzhong Qu
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent machine reading comprehension datasets such as ReClor and LogiQA require performing logical reasoning over text. Conventional neural models are insufficient for logical reasoning, while symbolic reasoners cannot directly apply to text. To meet the challenge, we present a neural-symbolic approach which, to predict an answer, passes messages over a graph representing logical relations between text units. It incorporates an adaptive logic graph network (AdaLoGN) which adaptively infers logical relations to extend the graph and, essentially, realizes mutual and iterative reinforcement between neural and symbolic reasoning. We also implement a novel subgraph-to-node message passing mechanism to enhance context-option interaction for answering multiple-choice questions. Our approach shows promising results on ReClor and LogiQA.

pdf
Capturing Conversational Interaction for Question Answering via Global History Reasoning
Jin Qian | Bowei Zou | Mengxing Dong | Xiao Li | AiTi Aw | Yu Hong
Findings of the Association for Computational Linguistics: NAACL 2022

Conversational Question Answering (ConvQA) is required to answer the current question, conditioned on the observable paragraph-level context and conversation history. Previous works have intensively studied history-dependent reasoning. They perceive and absorb topic-related information of prior utterances in the interactive encoding stage. It yielded significant improvement compared to history-independent reasoning. This paper further strengthens the ConvQA encoder by establishing long-distance dependency among global utterances in multi-turn conversation. We use multi-layer transformers to resolve long-distance relationships, which potentially contribute to the reweighting of attentive information in historical utterances. Experiments on QuAC show that our method obtains a substantial improvement (1%), yielding the F1 score of 73.7%. All source codes are available at https://github.com/jaytsien/GHR.

2020

pdf
Multi-Task Neural Model for Agglutinative Language Translation
Yirong Pan | Xiao Li | Yating Yang | Rui Dong
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Neural machine translation (NMT) has achieved impressive performance recently by using large-scale parallel corpora. However, it struggles in the low-resource and morphologically-rich scenarios of agglutinative language translation task. Inspired by the finding that monolingual data can greatly improve the NMT performance, we propose a multi-task neural model that jointly learns to perform bi-directional translation and agglutinative language stemming. Our approach employs the shared encoder and decoder to train a single model without changing the standard NMT architecture but instead adding a token before each source-side sentence to specify the desired target outputs of the two different tasks. Experimental results on Turkish-English and Uyghur-Chinese show that our proposed approach can significantly improve the translation performance on agglutinative languages by using a small amount of monolingual data.

pdf
Using a Penalty-based Loss Re-estimation Method to Improve Implicit Discourse Relation Classification
Xiao Li | Yu Hong | Huibin Ruan | Zhen Huang
Proceedings of the 28th International Conference on Computational Linguistics

We tackle implicit discourse relation classification, a task of automatically determining semantic relationships between arguments. The attention-worthy words in arguments are crucial clues for classifying the discourse relations. Attention mechanisms have been proven effective in highlighting the attention-worthy words during encoding. However, our survey shows that some inessential words are unintentionally misjudged as the attention-worthy words and, therefore, assigned heavier attention weights than should be. We propose a penalty-based loss re-estimation method to regulate the attention learning process, integrating penalty coefficients into the computation of loss by means of overstability of attention weight distributions. We conduct experiments on the Penn Discourse TreeBank (PDTB) corpus. The test results show that our loss re-estimation method leads to substantial improvements for a variety of attention mechanisms, and it obtains highly competitive performance compared to the state-of-the-art methods.

pdf
Improving Variational Autoencoder for Text Modelling with Timestep-Wise Regularisation
Ruizhe Li | Xiao Li | Guanyi Chen | Chenghua Lin
Proceedings of the 28th International Conference on Computational Linguistics

The Variational Autoencoder (VAE) is a popular and powerful model applied to text modelling to generate diverse sentences. However, an issue known as posterior collapse (or KL loss vanishing) happens when the VAE is used in text modelling, where the approximate posterior collapses to the prior, and the model will totally ignore the latent variables and be degraded to a plain language model during text generation. Such an issue is particularly prevalent when RNN-based VAE models are employed for text modelling. In this paper, we propose a simple, generic architecture called Timestep-Wise Regularisation VAE (TWR-VAE), which can effectively avoid posterior collapse and can be applied to any RNN-based VAE models. The effectiveness and versatility of our model are demonstrated in different tasks, including language modelling and dialogue response generation.

pdf
DGST: a Dual-Generator Network for Text Style Transfer
Xiao Li | Guanyi Chen | Chenghua Lin | Ruizhe Li
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We propose DGST, a novel and simple Dual-Generator network architecture for text Style Transfer. Our model employs two generators only, and does not rely on any discriminators or parallel corpus for training. Both quantitative and qualitative experiments on the Yelp and IMDb datasets show that our model gives competitive performance compared to several strong baselines with more complicated architecture designs.

2019

pdf
A Stable Variational Autoencoder for Text Modelling
Ruizhe Li | Xiao Li | Chenghua Lin | Matthew Collinson | Rui Mao
Proceedings of the 12th International Conference on Natural Language Generation

Variational Autoencoder (VAE) is a powerful method for learning representations of high-dimensional data. However, VAEs can suffer from an issue known as latent variable collapse (or KL term vanishing), where the posterior collapses to the prior and the model will ignore the latent codes in generative tasks. Such an issue is particularly prevalent when employing VAE-RNN architectures for text modelling (Bowman et al., 2016; Yang et al., 2017). In this paper, we present a new architecture called Full-Sampling-VAE-RNN, which can effectively avoid latent variable collapse. Compared to the general VAE-RNN architectures, we show that our model can achieve much more stable training process and can generate text with significantly better quality.

pdf
GeoSQA: A Benchmark for Scenario-based Question Answering in the Geography Domain at High School Level
Zixian Huang | Yulin Shen | Xiao Li | Yu’ang Wei | Gong Cheng | Lin Zhou | Xinyu Dai | Yuzhong Qu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Scenario-based question answering (SQA) has attracted increasing research attention. It typically requires retrieving and integrating knowledge from multiple sources, and applying general knowledge to a specific case described by a scenario. SQA widely exists in the medical, geography, and legal domains—both in practice and in the exams. In this paper, we introduce the GeoSQA dataset. It consists of 1,981 scenarios and 4,110 multiple-choice questions in the geography domain at high school level, where diagrams (e.g., maps, charts) have been manually annotated with natural language descriptions to benefit NLP research. Benchmark results on a variety of state-of-the-art methods for question answering, textual entailment, and reading comprehension demonstrate the unique challenges presented by SQA for future research.

pdf
A Dual-Attention Hierarchical Recurrent Neural Network for Dialogue Act Classification
Ruizhe Li | Chenghua Lin | Matthew Collinson | Xiao Li | Guanyi Chen
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Recognising dialogue acts (DA) is important for many natural language processing tasks such as dialogue generation and intention recognition. In this paper, we propose a dual-attention hierarchical recurrent neural network for DA classification. Our model is partially inspired by the observation that conversational utterances are normally associated with both a DA and a topic, where the former captures the social act and the latter describes the subject matter. However, such a dependency between DAs and topics has not been utilised by most existing systems for DA classification. With a novel dual task-specific attention mechanism, our model is able, for utterances, to capture information about both DAs and topics, as well as information about the interactions between them. Experimental results show that by modelling topic as an auxiliary task, our model can significantly improve DA classification, yielding better or comparable performance to the state-of-the-art method on three public datasets.

2018

pdf
Statistical NLG for Generating the Content and Form of Referring Expressions
Xiao Li | Kees van Deemter | Chenghua Lin
Proceedings of the 11th International Conference on Natural Language Generation

This paper argues that a new generic approach to statistical NLG can be made to perform Referring Expression Generation (REG) successfully. The model does not only select attributes and values for referring to a target referent, but also performs Linguistic Realisation, generating an actual Noun Phrase. Our evaluations suggest that the attribute selection aspect of the algorithm exceeds classic REG algorithms, while the Noun Phrases generated are as similar to those in a previously developed corpus as were Noun Phrases produced by a new set of human speakers.

2017

pdf
Investigating the content and form of referring expressions in Mandarin: introducing the Mtuna corpus
Kees van Deemter | Le Sun | Rint Sybesma | Xiao Li | Bo Chen | Muyun Yang
Proceedings of the 10th International Conference on Natural Language Generation

East Asian languages are thought to handle reference differently from languages such as English, particularly in terms of the marking of definiteness and number. We present the first Data-Text corpus for Referring Expressions in Mandarin, and we use this corpus to test some initial hypotheses inspired by the theoretical linguistics literature. Our findings suggest that function words deserve more attention in Referring Expressions Generation than they have so far received, and they have a bearing on the debate about whether different languages make different trade-offs between clarity and brevity.

pdf
Log-linear Models for Uyghur Segmentation in Spoken Language Translation
Chenggang Mi | Yating Yang | Rui Dong | Xi Zhou | Lei Wang | Xiao Li | Tonghai Jiang
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

To alleviate data sparsity in spoken Uyghur machine translation, we proposed a log-linear based morphological segmentation approach. Instead of learning model only from monolingual annotated corpus, this approach optimizes Uyghur segmentation for spoken translation based on both bilingual and monolingual corpus. Our approach relies on several features such as traditional conditional random field (CRF) feature, bilingual word alignment feature and monolingual suffixword co-occurrence feature. Experimental results shown that our proposed segmentation model for Uyghur spoken translation achieved 1.6 BLEU score improvements compared with the state-of-the-art baseline.

2016

pdf
Recurrent Neural Network Based Loanwords Identification in Uyghur
Chenggang Mi | Yating Yang | Xi Zhou | Lei Wang | Xiao Li | Tonghai Jiang
Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation: Oral Papers

pdf
Statistics-Based Lexical Choice for NLG from Quantitative Information
Xiao Li | Kees van Deemter | Chenghua Lin
Proceedings of the 9th International Natural Language Generation conference

2010

pdf
Understanding the Semantic Structure of Noun Phrase Queries
Xiao Li
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

2009

pdf
Semantic Tagging of Web Search Queries
Mehdi Manshadi | Xiao Li
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf
On the Use of Virtual Evidence in Conditional Random Fields
Xiao Li
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf
Discovery of Term Variation in Japanese Web Search Queries
Hisami Suzuki | Xiao Li | Jianfeng Gao
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

pdf
Learning N-Best Correction Models from Implicit User Feedback in a Multi-Modal Local Search Application
Dan Bohus | Xiao Li | Patrick Nguyen | Geoffrey Zweig
Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue

2005

pdf
The Vocal Joystick: A Voice-Based Human-Computer Interface for Individuals with Motor Impairments
Jeff A. Bilmes | Xiao Li | Jonathan Malkin | Kelley Kilanski | Richard Wright | Katrin Kirchhoff | Amar Subramanya | Susumu Harada | James Landay | Patricia Dowden | Howard Chizeck
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing