JinYeong Bak


2021

pdf bib
Learning Sequential and Structural Information for Source Code Summarization
YunSeok Choi | JinYeong Bak | CheolWon Na | Jee-Hyong Lee
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Knowledge-Enhanced Evidence Retrieval for Counterargument Generation
Yohan Jo | Haneul Yoo | JinYeong Bak | Alice Oh | Chris Reed | Eduard Hovy
Findings of the Association for Computational Linguistics: EMNLP 2021

Finding counterevidence to statements is key to many tasks, including counterargument generation. We build a system that, given a statement, retrieves counterevidence from diverse sources on the Web. At the core of this system is a natural language inference (NLI) model that determines whether a candidate sentence is valid counterevidence or not. Most NLI models to date, however, lack proper reasoning abilities necessary to find counterevidence that involves complex inference. Thus, we present a knowledge-enhanced NLI model that aims to handle causality- and example-based inference by incorporating knowledge graphs. Our NLI model outperforms baselines for NLI tasks, especially for instances that require the targeted inference. In addition, this NLI model further improves the counterevidence retrieval system, notably finding complex counterevidence better.

2020

pdf bib
Speaker Sensitive Response Evaluation Model
JinYeong Bak | Alice Oh
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Automatic evaluation of open-domain dialogue response generation is very challenging because there are many appropriate responses for a given context. Existing evaluation models merely compare the generated response with the ground truth response and rate many of the appropriate responses as inappropriate if they deviate from the ground truth. One approach to resolve this problem is to consider the similarity of the generated response with the conversational context. In this paper, we propose an automatic evaluation model based on that idea and learn the model parameters from an unlabeled conversation corpus. Our approach considers the speakers in defining the different levels of similar context. We use a Twitter conversation corpus that contains many speakers and conversations to test our evaluation model. Experiments show that our model outperforms the other existing evaluation metrics in terms of high correlation with human annotation scores. We also show that our model trained on Twitter can be applied to movie dialogues without any additional training. We provide our code and the learned parameters so that they can be used for automatic evaluation of dialogue response generation models.

2019

pdf bib
Variational Hierarchical User-based Conversation Model
JinYeong Bak | Alice Oh
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Generating appropriate conversation responses requires careful modeling of the utterances and speakers together. Some recent approaches to response generation model both the utterances and the speakers, but these approaches tend to generate responses that are overly tailored to the speakers. To overcome this limitation, we propose a new model with a stochastic variable designed to capture the speaker information and deliver it to the conversational context. An important part of this model is the network of speakers in which each speaker is connected to one or more conversational partner, and this network is then used to model the speakers better. To test whether our model generates more appropriate conversation responses, we build a new conversation corpus containing approximately 27,000 speakers and 770,000 conversations. With this corpus, we run experiments of generating conversational responses and compare our model with other state-of-the-art models. By automatic evaluation metrics and human evaluation, we show that our model outperforms other models in generating appropriate responses. An additional advantage of our model is that it generates better responses for various new user scenarios, for example when one of the speakers is a known user in our corpus but the partner is a new user. For replicability, we make available all our code and data.

2018

pdf bib
Conversational Decision-Making Model for Predicting the King’s Decision in the Annals of the Joseon Dynasty
JinYeong Bak | Alice Oh
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Styles of leaders when they make decisions in groups vary, and the different styles affect the performance of the group. To understand the key words and speakers associated with decisions, we initially formalize the problem as one of predicting leaders’ decisions from discussion with group members. As a dataset, we introduce conversational meeting records from a historical corpus, and develop a hierarchical RNN structure with attention and pre-trained speaker embedding in the form of a, Conversational Decision Making Model (CDMM). The CDMM outperforms other baselines to predict leaders’ final decisions from the data. We explain why CDMM works better than other methods by showing the key words and speakers discovered from the attentions as evidence.

2017

pdf bib
Rotated Word Vector Representations and their Interpretability
Sungjoon Park | JinYeong Bak | Alice Oh
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Vector representation of words improves performance in various NLP tasks, but the high dimensional word vectors are very difficult to interpret. We apply several rotation algorithms to the vector representation of words to improve the interpretability. Unlike previous approaches that induce sparsity, the rotated vectors are interpretable while preserving the expressive performance of the original vectors. Furthermore, any prebuilt word vector representation can be rotated for improved interpretability. We apply rotation to skipgrams and glove and compare the expressive power and interpretability with the original vectors and the sparse overcomplete vectors. The results show that the rotated vectors outperform the original and the sparse overcomplete vectors for interpretability and expressiveness tasks.

2015

pdf bib
Five Centuries of Monarchy in Korea: Mining the Text of the Annals of the Joseon Dynasty
JinYeong Bak | Alice Oh
Proceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH)

2014

pdf bib
Self-disclosure topic model for Twitter conversations
JinYeong Bak | Chin-Yew Lin | Alice Oh
Proceedings of the Joint Workshop on Social Dynamics and Personal Attributes in Social Media

pdf bib
Self-disclosure topic model for classifying and analyzing Twitter conversations
JinYeong Bak | Chin-Yew Lin | Alice Oh
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2012

pdf bib
Self-Disclosure and Relationship Strength in Twitter Conversations
JinYeong Bak | Suin Kim | Alice Oh
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)