Sarthak Dash


2021

pdf bib
Open Knowledge Graphs Canonicalization using Variational Autoencoders
Sarthak Dash | Gaetano Rossiello | Nandana Mihindukulasooriya | Sugato Bagchi | Alfio Gliozzo
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Noun phrases and Relation phrases in open knowledge graphs are not canonicalized, leading to an explosion of redundant and ambiguous subject-relation-object triples. Existing approaches to solve this problem take a two-step approach. First, they generate embedding representations for both noun and relation phrases, then a clustering algorithm is used to group them using the embeddings as features. In this work, we propose Canonicalizing Using Variational AutoEncoders and Side Information (CUVA), a joint model to learn both embeddings and cluster assignments in an end-to-end approach, which leads to a better vector representation for the noun and relation phrases. Our evaluation over multiple benchmarks shows that CUVA outperforms the existing state-of-the-art approaches. Moreover, we introduce CanonicNell, a novel dataset to evaluate entity canonicalization systems.

2020

pdf bib
Taxonomy Construction of Unseen Domains via Graph-based Cross-Domain Knowledge Transfer
Chao Shang | Sarthak Dash | Md. Faisal Mahbub Chowdhury | Nandana Mihindukulasooriya | Alfio Gliozzo
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Extracting lexico-semantic relations as graph-structured taxonomies, also known as taxonomy construction, has been beneficial in a variety of NLP applications. Recently Graph Neural Network (GNN) has shown to be powerful in successfully tackling many tasks. However, there has been no attempt to exploit GNN to create taxonomies. In this paper, we propose Graph2Taxo, a GNN-based cross-domain transfer framework for the taxonomy construction task. Our main contribution is to learn the latent features of taxonomy construction from existing domains to guide the structure learning of an unseen domain. We also propose a novel method of directed acyclic graph (DAG) generation for taxonomy construction. Specifically, our proposed Graph2Taxo uses a noisy graph constructed from automatically extracted noisy hyponym hypernym candidate pairs, and a set of taxonomies for some known domains for training. The learned model is then used to generate taxonomy for a new unknown domain given a set of terms for that domain. Experiments on benchmark datasets from science and environment domains show that our approach attains significant improvements correspondingly over the state of the art.

2019

pdf bib
Automatic Taxonomy Induction and Expansion
Nicolas Rodolfo Fauceglia | Alfio Gliozzo | Sarthak Dash | Md. Faisal Mahbub Chowdhury | Nandana Mihindukulasooriya
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

The Knowledge Graph Induction Service (KGIS) is an end-to-end knowledge induction system. One of its main capabilities is to automatically induce taxonomies from input documents using a hybrid approach that takes advantage of linguistic patterns, semantic web and neural networks. KGIS allows the user to semi-automatically curate and expand the induced taxonomy through a component called Smart SpreadSheet by exploiting distributional semantics. In this paper, we describe these taxonomy induction and expansion features of KGIS. A screencast video demonstrating the system is available in https://ibm.box.com/v/emnlp-2019-demo .

2014

pdf bib
Parsing Screenplays for Extracting Social Networks from Movies
Apoorv Agarwal | Sriramkumar Balasubramanian | Jiehan Zheng | Sarthak Dash
Proceedings of the 3rd Workshop on Computational Linguistics for Literature (CLFL)