Lingjue Xie


2023

pdf
Towards a Unified Multi-Domain Multilingual Named Entity Recognition Model
Mayank Kulkarni | Daniel Preotiuc-pietro | Karthik Radhakrishnan | Genta Winata | Shijie Wu | Lingjue Xie | Shaohua Yang
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Named Entity Recognition is a key Natural Language Processing task whose performance is sensitive to choice of genre and language. A unified NER model across multiple genres and languages is more practical and efficient by leveraging commonalities across genres or languages. In this paper, we propose a novel setup for NER which includes multi-domain and multilingual training and evaluation across 13 domains and 4 languages. We explore a range of approaches to building a unified model using domain and language adaptation techniques. Our experiments highlight multiple nuances to consider while building a unified model, including that naive data pooling fails to obtain good performance, that domain-specific adaptations are more important than language-specific ones and that including domain-specific adaptations in a unified model nears the performance of training multiple dedicated monolingual models at a fraction of their parameter count.

2022

pdf
Extractive Entity-Centric Summarization as Sentence Selection using Bi-Encoders
Ella Hofmann-Coyle | Mayank Kulkarni | Lingjue Xie | Mounica Maddela | Daniel Preotiuc-Pietro
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Entity-centric summarization is a type of controllable summarization that aims to produce a summary of a document that is specific to a given target entity. Extractive summaries possess multiple advantages over abstractive ones such as preserving factuality and can be directly used in downstream tasks like target-based sentiment analysis or incorporated into search applications. In this paper, we explore methods to solve this task by recasting it as a sentence selection task, as supported by the EntSUM data set. We use methods inspired by information retrieval, where the input to the model is a pair representing a sentence from the original document and the target entity, in place of the query. We explore different architecture variants and loss functions in this framework with results showing an up to 5.8 F1 improvement over past state-of-the-art and outperforming the competitive entity-centric Lead 3 heuristic by 1.1 F1. In addition, we also demonstrate similarly strong results on the related task of salient sentence selection for an entity.