Chul Sung


CNNBiF: CNN-based Bigram Features for Named Entity Recognition
Chul Sung | Vaibhava Goel | Etienne Marcheret | Steven Rennie | David Nahamoo
Findings of the Association for Computational Linguistics: EMNLP 2021

Transformer models fine-tuned with a sequence labeling objective have become the dominant choice for named entity recognition tasks. However, a self-attention mechanism with unconstrained length can fail to fully capture local dependencies, particularly when training data is limited. In this paper, we propose a novel joint training objective which better captures the semantics of words corresponding to the same entity. By augmenting the training objective with a group-consistency loss component we enhance our ability to capture local dependencies while still enjoying the advantages of the unconstrained self-attention mechanism. On the CoNLL2003 dataset, our method achieves a test F1 of 93.98 with a single transformer model. More importantly our fine-tuned CoNLL2003 model displays significant gains in generalization to out of domain datasets: on the OntoNotes subset we achieve an F1 of 72.67 which is 0.49 points absolute better than the baseline, and on the WNUT16 set an F1 of 68.22 which is a gain of 0.48 points. Furthermore, on the WNUT17 dataset we achieve an F1 of 55.85, yielding a 2.92 point absolute improvement.


Pre-Training BERT on Domain Resources for Short Answer Grading
Chul Sung | Tejas Dhamecha | Swarnadeep Saha | Tengfei Ma | Vinay Reddy | Rishi Arora
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Pre-trained BERT contextualized representations have achieved state-of-the-art results on multiple downstream NLP tasks by fine-tuning with task-specific data. While there has been a lot of focus on task-specific fine-tuning, there has been limited work on improving the pre-trained representations. In this paper, we explore ways of improving the pre-trained contextual representations for the task of automatic short answer grading, a critical component of intelligent tutoring systems. We show that the pre-trained BERT model can be improved by augmenting data from the domain-specific resources like textbooks. We also present a new approach to use labeled short answering grading data for further enhancement of the language model. Empirical evaluation on multi-domain datasets shows that task-specific fine-tuning on the enhanced pre-trained language model achieves superior performance for short answer grading.