Hai-Long Trieu

Also published as: Hai Long Trieu


2022

pdf
Named Entity Recognition for Cancer Immunology Research Using Distant Supervision
Hai-Long Trieu | Makoto Miwa | Sophia Ananiadou
Proceedings of the 21st Workshop on Biomedical Language Processing

Cancer immunology research involves several important cell and protein factors. Extracting the information of such cells and proteins and the interactions between them from text are crucial in text mining for cancer immunology research. However, there are few available datasets for these entities, and the amount of annotated documents is not sufficient compared with other major named entity types. In this work, we introduce our automatically annotated dataset of key named entities, i.e., T-cells, cytokines, and transcription factors, which engages the recent cancer immunotherapy. The entities are annotated based on the UniProtKB knowledge base using dictionary matching. We build a neural named entity recognition (NER) model to be trained on this dataset and evaluate it on a manually-annotated data. Experimental results show that we can achieve a promising NER performance even though our data is automatically annotated. Our dataset also enhances the NER performance when combined with existing data, especially gaining improvement in yet investigated named entities such as cytokines and transcription factors.

2019

pdf
Coreference Resolution in Full Text Articles with BERT and Syntax-based Mention Filtering
Hai-Long Trieu | Anh-Khoa Duong Nguyen | Nhung Nguyen | Makoto Miwa | Hiroya Takamura | Sophia Ananiadou
Proceedings of the 5th Workshop on BioNLP Open Shared Tasks

This paper describes our system developed for the coreference resolution task of the CRAFT Shared Tasks 2019. The CRAFT corpus is more challenging than other existing corpora because it contains full text articles. We have employed an existing span-based state-of-theart neural coreference resolution system as a baseline system. We enhance the system with two different techniques to capture longdistance coreferent pairs. Firstly, we filter noisy mentions based on parse trees with increasing the number of antecedent candidates. Secondly, instead of relying on the LSTMs, we integrate the highly expressive language model–BERT into our model. Experimental results show that our proposed systems significantly outperform the baseline. The best performing system obtained F-scores of 44%, 48%, 39%, 49%, 40%, and 57% on the test set with B3, BLANC, CEAFE, CEAFM, LEA, and MUC metrics, respectively. Additionally, the proposed model is able to detect coreferent pairs in long distances, even with a distance of more than 200 sentences.

2018

pdf
Investigating Domain-Specific Information for Neural Coreference Resolution on Biomedical Texts
Hai-Long Trieu | Nhung T. H. Nguyen | Makoto Miwa | Sophia Ananiadou
Proceedings of the BioNLP 2018 workshop

Existing biomedical coreference resolution systems depend on features and/or rules based on syntactic parsers. In this paper, we investigate the utility of the state-of-the-art general domain neural coreference resolution system on biomedical texts. The system is an end-to-end system without depending on any syntactic parsers. We also investigate the domain specific features to enhance the system for biomedical texts. Experimental results on the BioNLP Protein Coreference dataset and the CRAFT corpus show that, with no parser information, the adapted system compared favorably with the systems that depend on parser information on these datasets, achieving 51.23% on the BioNLP dataset and 36.33% on the CRAFT corpus in F1 score. In-domain embeddings and domain-specific features helped improve the performance on the BioNLP dataset, but they did not on the CRAFT corpus.

2017

pdf
The JAIST Machine Translation Systems for WMT 17
Hai-Long Trieu | Trung-Tin Pham | Le-Minh Nguyen
Proceedings of the Second Conference on Machine Translation

pdf
A Multilingual Parallel Corpus for Improving Machine Translation on Southeast Asian Languages
Hai-Long Trieu | Le-Minh Nguyen
Proceedings of Machine Translation Summit XVI: Research Track

pdf
Investigating Phrase-Based and Neural-Based Machine Translation on Low-Resource Settings
Hai Long Trieu | Duc-Vu Tran | Le Minh Nguyen
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation

2016

pdf
Dealing with Out-Of-Vocabulary Problem in Sentence Alignment Using Word Similarity
Hai-Long Trieu | Le-Minh Nguyen | Phuong-Thai Nguyen
Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation: Oral Papers

2015

pdf
The JAIST-UET-MITI machine translation systems for IWSLT 2015
Hai-Long Trieu | Thanh-Quyen Dang | Phuong-Thai Nguyen | Le-Minh Nuyen
Proceedings of the 12th International Workshop on Spoken Language Translation: Evaluation Campaign