2019
pdf
abs
Toward Automated Content Feedback Generation for Non-native Spontaneous Speech
Su-Youn Yoon
|
Ching-Ni Hsieh
|
Klaus Zechner
|
Matthew Mulholland
|
Yuan Wang
|
Nitin Madnani
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications
In this study, we developed an automated algorithm to provide feedback about the specific content of non-native English speakers’ spoken responses. The responses were spontaneous speech, elicited using integrated tasks where the language learners listened to and/or read passages and integrated the core content in their spoken responses. Our models detected the absence of key points considered to be important in a spoken response to a particular test question, based on two different models: (a) a model using word-embedding based content features and (b) a state-of-the art short response scoring engine using traditional n-gram based features. Both models achieved a substantially improved performance over the majority baseline, and the combination of the two models achieved a significant further improvement. In particular, the models were robust to automated speech recognition (ASR) errors, and performance based on the ASR word hypotheses was comparable to that based on manual transcriptions. The accuracy and F-score of the best model for the questions included in the train set were 0.80 and 0.68, respectively. Finally, we discussed possible approaches to generating targeted feedback about the content of a language learner’s response, based on automatically detected missing key points.
pdf
abs
Content Modeling for Automated Oral Proficiency Scoring System
Su-Youn Yoon
|
Chong Min Lee
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications
We developed an automated oral proficiency scoring system for non-native English speakers’ spontaneous speech. Automated systems that score holistic proficiency are expected to assess a wide range of performance categories, and the content is one of the core performance categories. In order to assess the quality of the content, we trained a Siamese convolutional neural network (CNN) to model the semantic relationship between key points generated by experts and a test response. The correlation between human scores and Siamese CNN scores was comparable to human-human agreement (r=0.63), and it was higher than the baseline content features. The inclusion of Siamese CNN-based feature to the existing state-of-the-art automated scoring model achieved a small but statistically significant improvement. However, the new model suffered from score inflation for long atypical responses with serious content issues. We investigated the reasons of this score inflation by analyzing the associations with linguistic features and identifying areas strongly associated with the score errors.
2018
pdf
abs
Atypical Inputs in Educational Applications
Su-Youn Yoon
|
Aoife Cahill
|
Anastassia Loukina
|
Klaus Zechner
|
Brian Riordan
|
Nitin Madnani
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)
In large-scale educational assessments, the use of automated scoring has recently become quite common. While the majority of student responses can be processed and scored without difficulty, there are a small number of responses that have atypical characteristics that make it difficult for an automated scoring system to assign a correct score. We describe a pipeline that detects and processes these kinds of responses at run-time. We present the most frequent kinds of what are called non-scorable responses along with effective filtering models based on various NLP and speech processing technologies. We give an overview of two operational automated scoring systems —one for essay scoring and one for speech scoring— and describe the filtering models they use. Finally, we present an evaluation and analysis of filtering models used for spoken responses in an assessment of language proficiency.
pdf
bib
abs
Word-Embedding based Content Features for Automated Oral Proficiency Scoring
Su-Youn Yoon
|
Anastassia Loukina
|
Chong Min Lee
|
Matthew Mulholland
|
Xinhao Wang
|
Ikkyu Choi
Proceedings of the Third Workshop on Semantic Deep Learning
In this study, we develop content features for an automated scoring system of non-native English speakers’ spontaneous speech. The features calculate the lexical similarity between the question text and the ASR word hypothesis of the spoken response, based on traditional word vector models or word embeddings. The proposed features do not require any sample training responses for each question, and this is a strong advantage since collecting question-specific data is an expensive task, and sometimes even impossible due to concerns about question exposure. We explore the impact of these new features on the automated scoring of two different question types: (a) providing opinions on familiar topics and (b) answering a question about a stimulus material. The proposed features showed statistically significant correlations with the oral proficiency scores, and the combination of new features with the speech-driven features achieved a small but significant further improvement for the latter question type. Further analyses suggested that the new features were effective in assigning more accurate scores for responses with serious content issues.
2016
pdf
Automated classification of collaborative problem solving interactions in simulated science tasks
Michael Flor
|
Su-Youn Yoon
|
Jiangang Hao
|
Lei Liu
|
Alina von Davier
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications
pdf
Spoken Text Difficulty Estimation Using Linguistic Features
Su-Youn Yoon
|
Yeonsuk Cho
|
Diane Napolitano
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications
pdf
abs
Can We Make Computers Laugh at Talks?
Chong Min Lee
|
Su-Youn Yoon
|
Lei Chen
Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES)
Considering the importance of public speech skills, a system which makes a prediction on where audiences laugh in a talk can be helpful to a person who prepares for a talk. We investigated a possibility that a state-of-the-art humor recognition system can be used in detecting sentences inducing laughters in talks. In this study, we used TED talks and laughters in the talks as data. Our results showed that the state-of-the-art system needs to be improved in order to be used in a practical application. In addition, our analysis showed that classifying humorous sentences in talks is very challenging due to close distance between humorous and non-humorous sentences.
pdf
abs
Evaluating Argumentative and Narrative Essays using Graphs
Swapna Somasundaran
|
Brian Riordan
|
Binod Gyawali
|
Su-Youn Yoon
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
This work investigates whether the development of ideas in writing can be captured by graph properties derived from the text. Focusing on student essays, we represent the essay as a graph, and encode a variety of graph properties including PageRank as features for modeling essay scores related to quality of development. We demonstrate that our approach improves on a state-of-the-art system on the task of holistic scoring of persuasive essays and on the task of scoring narrative essays along the development dimension.
pdf
abs
Textual complexity as a predictor of difficulty of listening items in language proficiency tests
Anastassia Loukina
|
Su-Youn Yoon
|
Jennifer Sakano
|
Youhua Wei
|
Kathy Sheehan
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
In this paper we explore to what extent the difficulty of listening items in an English language proficiency test can be predicted by the textual properties of the prompt. We show that a system based on multiple text complexity features can predict item difficulty for several different item types and for some items achieves higher accuracy than human estimates of item difficulty.
2014
pdf
Similarity-Based Non-Scorable Response Detection for Automated Speech Scoring
Su-Youn Yoon
|
Shasha Xie
Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications
pdf
Automated scoring of speaking items in an assessment for teachers of English as a Foreign Language
Klaus Zechner
|
Keelan Evanini
|
Su-Youn Yoon
|
Lawrence Davis
|
Xinhao Wang
|
Lei Chen
|
Chong Min Lee
|
Chee Wee Leong
Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications
pdf
Shallow Analysis Based Assessment of Syntactic Complexity for Automated Speech Scoring
Suma Bhat
|
Huichao Xue
|
Su-Youn Yoon
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
2012
pdf
Vocabulary Profile as a Measure of Vocabulary Sophistication
Su-Youn Yoon
|
Suma Bhat
|
Klaus Zechner
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
pdf
Assessment of ESL Learners’ Syntactic Competence Based on Similarity Measures
Su-Youn Yoon
|
Suma Bhat
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
2011
pdf
Detecting Structural Events for Assessing Non-Native Speech
Lei Chen
|
Su-Youn Yoon
Proceedings of the Sixth Workshop on Innovative Use of NLP for Building Educational Applications
pdf
Non-scorable Response Detection for Automated Speaking Proficiency Assessment
Su-Youn Yoon
|
Keelan Evanini
|
Klaus Zechner
Proceedings of the Sixth Workshop on Innovative Use of NLP for Building Educational Applications
pdf
Non-English Response Detection Method for Automated Proficiency Scoring System
Su-Youn Yoon
|
Derrick Higgins
Proceedings of the Sixth Workshop on Innovative Use of NLP for Building Educational Applications
2010
pdf
abs
A Python Toolkit for Universal Transliteration
Ting Qian
|
Kristy Hollingshead
|
Su-youn Yoon
|
Kyoung-young Kim
|
Richard Sproat
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
We describe ScriptTranscriber, an open source toolkit for extracting transliterations in comparable corpora from languages written in different scripts. The system includes various methods for extracting potential terms of interest from raw text, for providing guesses on the pronunciations of terms, and for comparing two strings as possible transliterations using both phonetic and temporal measures. The system works with any script in the Unicode Basic Multilingual Plane and is easily extended to include new modules. Given comparable corpora, such as newswire text, in a pair of languages that use different scripts, ScriptTranscriber provides an easy way to mine transliterations from the comparable texts. This is particularly useful for underresourced languages, where training data for transliteration may be lacking, and where it is thus hard to train good transliterators. ScriptTranscriber provides an open source package that allows for ready incorporation of more sophisticated modules ― e.g. a trained transliteration model for a particular language pair. ScriptTranscriber is available as part of the nltk contrib source tree at
http://code.google.com/p/nltk/.
2007
pdf
Multilingual Transliteration Using Feature based Phonetic Method
Su-Youn Yoon
|
Kyoung-Young Kim
|
Richard Sproat
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics
2006
pdf
Unsupervised Named Entity Transliteration Using Temporal and Phonetic Correlation
Tao Tao
|
Su-Youn Yoon
|
Andrew Fister
|
Richard Sproat
|
ChengXiang Zhai
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing