Van-Thuy Phi


2024

pdf
PolyNERE: A Novel Ontology and Corpus for Named Entity Recognition and Relation Extraction in Polymer Science Domain
Van-Thuy Phi | Hiroki Teranishi | Yuji Matsumoto | Hiroyuki Oka | Masashi Ishii
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Polymers are widely used in diverse fields, and the demand for efficient methods to extract and organize information about them is increasing. An automated approach that utilizes machine learning can accurately extract relevant information from scientific papers, providing a promising solution for automating information extraction using annotated training data. In this paper, we introduce a polymer-relevant ontology featuring crucial entities and relations to enhance information extraction in the polymer science field. Our ontology is customizable to adapt to specific research needs. We present PolyNERE, a high-quality named entity recognition (NER) and relation extraction (RE) corpus comprising 750 polymer abstracts annotated using our ontology. Distinctive features of PolyNERE include multiple entity types, relation categories, support for various NER settings, and the ability to assert entities and relations at different levels. PolyNERE also facilitates reasoning in the RE task through supporting evidence. While our experiments with recent advanced methods achieved promising results, challenges persist in adapting NER and RE from abstracts to full-text paragraphs. This emphasizes the need for robust information extraction systems in the polymer domain, making our corpus a valuable benchmark for future developments.

2019

pdf
Relation Classification Using Segment-Level Attention-based CNN and Dependency-based RNN
Van-Hien Tran | Van-Thuy Phi | Hiroyuki Shindo | Yuji Matsumoto
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Recently, relation classification has gained much success by exploiting deep neural networks. In this paper, we propose a new model effectively combining Segment-level Attention-based Convolutional Neural Networks (SACNNs) and Dependency-based Recurrent Neural Networks (DepRNNs). While SACNNs allow the model to selectively focus on the important information segment from the raw sequence, DepRNNs help to handle the long-distance relations from the shortest dependency path of relation entities. Experiments on the SemEval-2010 Task 8 dataset show that our model is comparable to the state-of-the-art without using any external lexical features.

2018

pdf
Ranking-Based Automatic Seed Selection and Noise Reduction for Weakly Supervised Relation Extraction
Van-Thuy Phi | Joan Santoso | Masashi Shimbo | Yuji Matsumoto
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

This paper addresses the tasks of automatic seed selection for bootstrapping relation extraction, and noise reduction for distantly supervised relation extraction. We first point out that these tasks are related. Then, inspired by ranking relation instances and patterns computed by the HITS algorithm, and selecting cluster centroids using the K-means, LSA, or NMF method, we propose methods for selecting the initial seeds from an existing resource, or reducing the level of noise in the distantly labeled data. Experiments show that our proposed methods achieve a better performance than the baseline systems in both tasks.

2016

pdf
Integrating Word Embedding Offsets into the Espresso System for Part-Whole Relation Extraction
Van-Thuy Phi | Yuji Matsumoto
Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation: Oral Papers

2013

pdf
Exploring a Probabilistic Earley Parser for Event Composition in Biomedical Texts
Mai-Vu Tran | Nigel Collier | Hoang-Quynh Le | Van-Thuy Phi | Thanh-Binh Pham
Proceedings of the BioNLP Shared Task 2013 Workshop