Young-Suk Lee


2021

pdf bib
Bootstrapping Multilingual AMR with Contextual Word Alignments
Janaki Sheth | Young-Suk Lee | Ramón Fernandez Astudillo | Tahira Naseem | Radu Florian | Salim Roukos | Todd Ward
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

We develop high performance multilingual Abstract Meaning Representation (AMR) systems by projecting English AMR annotations to other languages with weak supervision. We achieve this goal by bootstrapping transformer-based multilingual word embeddings, in particular those from cross-lingual RoBERTa (XLM-R large). We develop a novel technique for foreign-text-to-English AMR alignment, using the contextual word alignment between English and foreign language tokens. This word alignment is weakly supervised and relies on the contextualized XLM-R word embeddings. We achieve a highly competitive performance that surpasses the best published results for German, Italian, Spanish and Chinese.

pdf bib
Leveraging Abstract Meaning Representation for Knowledge Base Question Answering
Pavan Kapanipathi | Ibrahim Abdelaziz | Srinivas Ravishankar | Salim Roukos | Alexander Gray | Ramón Fernandez Astudillo | Maria Chang | Cristina Cornelio | Saswati Dana | Achille Fokoue | Dinesh Garg | Alfio Gliozzo | Sairam Gurajada | Hima Karanam | Naweed Khan | Dinesh Khandelwal | Young-Suk Lee | Yunyao Li | Francois Luus | Ndivhuwo Makondo | Nandana Mihindukulasooriya | Tahira Naseem | Sumit Neelam | Lucian Popa | Revanth Gangi Reddy | Ryan Riegel | Gaetano Rossiello | Udit Sharma | G P Shrivatsa Bhargav | Mo Yu
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
A Semantics-aware Transformer Model of Relation Linking for Knowledge Base Question Answering
Tahira Naseem | Srinivas Ravishankar | Nandana Mihindukulasooriya | Ibrahim Abdelaziz | Young-Suk Lee | Pavan Kapanipathi | Salim Roukos | Alfio Gliozzo | Alexander Gray
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Relation linking is a crucial component of Knowledge Base Question Answering systems. Existing systems use a wide variety of heuristics, or ensembles of multiple systems, heavily relying on the surface question text. However, the explicit semantic parse of the question is a rich source of relation information that is not taken advantage of. We propose a simple transformer-based neural model for relation linking that leverages the AMR semantic parse of a sentence. Our system significantly outperforms the state-of-the-art on 4 popular benchmark datasets. These are based on either DBpedia or Wikidata, demonstrating that our approach is effective across KGs.

pdf bib
Structure-aware Fine-tuning of Sequence-to-sequence Transformers for Transition-based AMR Parsing
Jiawei Zhou | Tahira Naseem | Ramón Fernandez Astudillo | Young-Suk Lee | Radu Florian | Salim Roukos
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Predicting linearized Abstract Meaning Representation (AMR) graphs using pre-trained sequence-to-sequence Transformer models has recently led to large improvements on AMR parsing benchmarks. These parsers are simple and avoid explicit modeling of structure but lack desirable properties such as graph well-formedness guarantees or built-in graph-sentence alignments. In this work we explore the integration of general pre-trained sequence-to-sequence language models and a structure-aware transition-based approach. We depart from a pointer-based transition system and propose a simplified transition set, designed to better exploit pre-trained language models for structured fine-tuning. We also explore modeling the parser state within the pre-trained encoder-decoder architecture and different vocabulary strategies for the same purpose. We provide a detailed comparison with recent progress in AMR parsing and show that the proposed parser retains the desirable properties of previous transition-based approaches, while being simpler and reaching the new parsing state of the art for AMR 2.0, without the need for graph re-categorization.

2020

pdf bib
Pushing the Limits of AMR Parsing with Self-Learning
Young-Suk Lee | Ramón Fernandez Astudillo | Tahira Naseem | Revanth Gangi Reddy | Radu Florian | Salim Roukos
Findings of the Association for Computational Linguistics: EMNLP 2020

Abstract Meaning Representation (AMR) parsing has experienced a notable growth in performance in the last two years, due both to the impact of transfer learning and the development of novel architectures specific to AMR. At the same time, self-learning techniques have helped push the performance boundaries of other natural language processing applications, such as machine translation or question answering. In this paper, we explore different ways in which trained models can be applied to improve AMR parsing performance, including generation of synthetic text and AMR annotations as well as refinement of actions oracle. We show that, without any additional human annotations, these techniques improve an already performant parser and achieve state-of-the-art results on AMR 1.0 and AMR 2.0.

pdf bib
GPT-too: A Language-Model-First Approach for AMR-to-Text Generation
Manuel Mager | Ramón Fernandez Astudillo | Tahira Naseem | Md Arafat Sultan | Young-Suk Lee | Radu Florian | Salim Roukos
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Abstract Meaning Representations (AMRs) are broad-coverage sentence-level semantic graphs. Existing approaches to generating text from AMR have focused on training sequence-to-sequence or graph-to-sequence models on AMR annotated data only. In this paper, we propose an alternative approach that combines a strong pre-trained language model with cycle consistency-based re-scoring. Despite the simplicity of the approach, our experimental results show these models outperform all previous techniques on the English LDC2017T10 dataset, including the recent use of transformer architectures. In addition to the standard evaluation metrics, we provide human evaluation experiments that further substantiate the strength of our approach.

2018

pdf bib
IBM Research at the CoNLL 2018 Shared Task on Multilingual Parsing
Hui Wan | Tahira Naseem | Young-Suk Lee | Vittorio Castelli | Miguel Ballesteros
Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

This paper presents the IBM Research AI submission to the CoNLL 2018 Shared Task on Parsing Universal Dependencies. Our system implements a new joint transition-based parser, based on the Stack-LSTM framework and the Arc-Standard algorithm, that handles tokenization, part-of-speech tagging, morphological tagging and dependency parsing in one single model. By leveraging a combination of character-based modeling of words and recursive composition of partially built linguistic structures we qualified 13th overall and 7th in low resource. We also present a new sentence segmentation neural architecture based on Stack-LSTMs that was the 4th best overall.

2016

pdf bib
Language Independent Dependency to Constituent Tree Conversion
Young-Suk Lee | Zhiguo Wang
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We present a dependency to constituent tree conversion technique that aims to improve constituent parsing accuracies by leveraging dependency treebanks available in a wide variety in many languages. The technique works in two steps. First, a partial constituent tree is derived from a dependency tree with a very simple deterministic algorithm that is both language and dependency type independent. Second, a complete high accuracy constituent tree is derived with a constraint-based parser, which uses the partial constituent tree as external constraints. Evaluated on Section 22 of the WSJ Treebank, the technique achieves the state-of-the-art conversion F-score 95.6. When applied to English Universal Dependency treebank and German CoNLL2006 treebank, the converted treebanks added to the human-annotated constituent parser training corpus improve parsing F-scores significantly for both languages.

2014

pdf bib
Confusion Network for Arabic Name Disambiguation and Transliteration in Statistical Machine Translation
Young-Suk Lee
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2011

pdf bib
Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
Bing Zhao | Young-Suk Lee | Xiaoqiang Luo | Liu Li
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Constituent Reordering and Syntax Models for English-to-Japanese Statistical Machine Translation
Young-Suk Lee | Bing Zhao | Xiaoqian Luo
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2006

pdf bib
IBM Arabic-to-English translation for IWSLT 2006
Young-Suk Lee
Proceedings of the Third International Workshop on Spoken Language Translation: Evaluation Campaign

2005

pdf bib
IBM Statistical Machine Translation for Spoken Languages
Young-Suk Lee
Proceedings of the Second International Workshop on Spoken Language Translation

2004

pdf bib
Morphological Analysis for Statistical Machine Translation
Young-Suk Lee
Proceedings of HLT-NAACL 2004: Short Papers

pdf bib
IBM spoken language translation system evaluation
Young-Suk Lee | Salim Roukos
Proceedings of the First International Workshop on Spoken Language Translation: Evaluation Campaign

2003

pdf bib
Language Model Based Arabic Word Segmentation
Young-Suk Lee | Kishore Papineni | Salim Roukos | Ossama Emam | Hany Hassan
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

pdf bib
TIPS: A Translingual Information Processing System
Yaser Al-Onaizan | Radu Florian | Martin Franz | Hany Hassan | Young-Suk Lee | J. Scott McCarley | Kishore Papineni | Salim Roukos | Jeffrey Sorensen | Christoph Tillmann | Todd Ward | Fei Xia
Companion Volume of the Proceedings of HLT-NAACL 2003 - Demonstrations

2001

pdf bib
Interlingua-Based Broad-Coverage Korean-to-English Translation in CCLINC
Young-Suk Lee | Wu Sok Yi | Stephanie Seneff | Clifford J. Weinstein
Proceedings of the First International Conference on Human Language Technology Research

1997

pdf bib
Simplification of nomenclature leads to an ideal IL for human language communication
Young-Suk Lee | Clifford Weinstein | Dinesh Tummala | Linda Kukolich | Stephanie Seneff
AMTA/SIG-IL First Workshop on Interlinguas

pdf bib
Ambiguity Resolution for Machine Translation of Telegraphic Messages
Young-Suk Lee | Clifford Weinstein | Stephanie Seneff | Dinesh Tummala
35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics

1996

pdf bib
Automatic English-to-Korean Text Translation of Telegraphic Messages in a Limited Domain
Clifford Weinstein | Dinesh Tummala | Young-Suk Lee | Stephanie Seneff
COLING 1996 Volume 2: The 16th International Conference on Computational Linguistics