In this paper, we propose CHOLAN, a modular approach to target end-to-end entity linking (EL) over knowledge bases. CHOLAN consists of a pipeline of two transformer-based models integrated sequentially to accomplish the EL task. The first transformer model identifies surface forms (entity mentions) in a given text. For each mention, a second transformer model is employed to classify the target entity among a predefined candidates list. The latter transformer is fed by an enriched context captured from the sentence (i.e. local context), and entity description gained from Wikipedia. Such external contexts have not been used in state of the art EL approaches. Our empirical study was conducted on two well-known knowledge bases (i.e., Wikidata and Wikipedia). The empirical results suggest that CHOLAN outperforms state-of-the-art approaches on standard datasets such as CoNLL-AIDA, MSNBC, AQUAINT, ACE2004, and T-REx.
Explanation generation introduced as the world tree corpus (Jansen et al., 2018) is an emerging NLP task involving multi-hop inference for explaining the correct answer in multiple-choice QA. It is a challenging task evidenced by low state-of-the-art performances(below 60% in F-score) demonstrated on the task. Of the state-of-the-art approaches, fine-tuned transformer-based (Vaswani et al., 2017) BERT models have shown great promise toward continued system performance improvements compared with approaches relying on surface-level cues alone that demonstrate performance saturation. In this work, we take a novel direction by addressing a particular linguistic characteristic of the data — we introduce a novel and lightweight focus feature in the transformer-based model and examine task improvements. Our evaluations reveal a significantly positive impact of this lightweight focus feature achieving the highest scores, second only to a significantly computationally intensive system.
The TextGraphs 2019 Shared Task on Multi-Hop Inference for Explanation Regeneration (MIER-19) tackles explanation generation for answers to elementary science questions. It builds on the AI2 Reasoning Challenge 2018 (ARC-18) which was organized as an advanced question answering task on a dataset of elementary science questions. The ARC-18 questions were shown to be hard to answer with systems focusing on surface-level cues alone, instead requiring far more powerful knowledge and reasoning. To address MIER-19, we adopt a hybrid pipelined architecture comprising a featurerich learning-to-rank (LTR) machine learning model, followed by a rule-based system for reranking the LTR model predictions. Our system was ranked fourth in the official evaluation, scoring close to the second and third ranked teams, achieving 39.4% MAP.
Short texts challenge NLP tasks such as named entity recognition, disambiguation, linking and relation inference because they do not provide sufficient context or are partially malformed (e.g. wrt. capitalization, long tail entities, implicit relations). In this work, we present the Falcon approach which effectively maps entities and relations within a short text to its mentions of a background knowledge graph. Falcon overcomes the challenges of short text using a light-weight linguistic approach relying on a background knowledge graph. Falcon performs joint entity and relation linking of a short text by leveraging several fundamental principles of English morphology (e.g. compounding, headword identification) and utilizes an extended knowledge graph created by merging entities and relations from various knowledge sources. It uses the context of entities for finding relations and does not require training data. Our empirical study using several standard benchmarks and datasets show that Falcon significantly outperforms state-of-the-art entity and relation linking for short text query inventories.