Beata Beigman Klebanov

Also published as: Beata Beigman Klebanov

2024

pdf abs
Automated Evaluation of Teacher Encouragement of Student-to-Student Interactions in a Simulated Classroom Discussion
Michael Ilagan | Beata Beigman Klebanov | Jamie Mikeska
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)

Leading students to engage in argumentation-focused discussions is a challenge for elementary school teachers, as doing so requires facilitating group discussions with student-to-student interaction. The Mystery Powder (MP) Task was designed to be used in online simulated classrooms to develop teachers’ skill in facilitating small group science discussions. In order to provide timely and scaleable feedback to teachers facilitating a discussion in the simulated classroom, we employ a hybrid modeling approach that successfully combines fine-tuned large language models with features capturing important elements of the discourse dynamic to evaluate MP discussion transcripts. To our knowledge, this is the first application of a hybrid model to automate evaluation of teacher discourse.

pdf abs
Anna Karenina Strikes Again: Pre-Trained LLM Embeddings May Favor High-Performing Learners
Abigail Gurin Schleifer | Beata Beigman Klebanov | Moriah Ariely | Giora Alexandron
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)

Unsupervised clustering of student responses to open-ended questions into behavioral and cognitive profiles using pre-trained LLM embeddings is an emerging technique, but little is known about how well this captures pedagogically meaningful information. We investigate this in the context of student responses to open-ended questions in biology, which were previously analyzed and clustered by experts into theory-driven Knowledge Profiles (KPs).Comparing these KPs to ones discovered by purely data-driven clustering techniques, we report poor discoverability of most KPs, except for the ones including the correct answers. We trace this ‘discoverability bias’ to the representations of KPs in the pre-trained LLM embeddings space.

pdf abs
From Miscue to Evidence of Difficulty: Analysis of Automatically Detected Miscues in Oral Reading for Feedback Potential
Beata Beigman Klebanov | Michael Suhan | Tenaha O’Reilly | Zuowei Wang
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)

This research is situated in the space between an existing NLP capability and its use(s) in an educational context. We analyze oral reading data collected with a deployed automated speech analysis software and consider how the results of automated speech analysis can be interpreted and used to inform the ideation and design of a new feature – feedback to learners and teachers. Our analysis shows how the details of the system’s performance and the details of the context of use both significantly impact the ideation process.

2023

pdf bib
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)
Sunayana Sitaram | Beata Beigman Klebanov | Jason D Williams
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)

pdf abs
Transformer-based Hebrew NLP models for Short Answer Scoring in Biology
Abigail Gurin Schleifer | Beata Beigman Klebanov | Moriah Ariely | Giora Alexandron
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)

Pre-trained large language models (PLMs) are adaptable to a wide range of downstream tasks by fine-tuning their rich contextual embeddings to the task, often without requiring much task-specific data. In this paper, we explore the use of a recently developed Hebrew PLM aleph-BERT for automated short answer grading of high school biology items. We show that the alephBERT-based system outperforms a strong CNN-based baseline, and that it general-izes unexpectedly well in a zero-shot paradigm to items on an unseen topic that address the same underlying biological concepts, opening up the possibility of automatically assessing new items without item-specific fine-tuning.

pdf abs
A dynamic model of lexical experience for tracking of oral reading fluency
Beata Beigman Klebanov | Michael Suhan | Zuowei Wang | Tenaha O’reilly
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)

We present research aimed at solving a problem in assessment of oral reading fluency using children’s oral reading data from our online book reading app. It is known that properties of the passage being read aloud impact fluency estimates; therefore, passage-based measures are used to remove passage-related variance when estimating growth in oral reading fluency. However, passage-based measures reported in the literature tend to treat passages as independent events, without explicitly modeling accumulation of lexical experience as one reads through a book. We propose such a model and show that it helps explain additional variance in the measurements of children’s fluency as they read through a book, improving over a strong baseline. These results have implications for measuring growth in oral reading fluency.

2022

pdf bib
Proceedings of the 3rd Workshop on Figurative Language Processing (FLP)
Debanjan Ghosh | Beata Beigman Klebanov | Smaranda Muresan | Anna Feldman | Soujanya Poria | Tuhin Chakrabarty
Proceedings of the 3rd Workshop on Figurative Language Processing (FLP)

2020

In this paper, we report on the shared task on metaphor identification on VU Amsterdam Metaphor Corpus and on a subset of the TOEFL Native Language Identification Corpus. The shared task was conducted as apart of the ACL 2020 Workshop on Processing Figurative Language.

pdf abs
Go Figure! Multi-task transformer-based architecture for metaphor detection using idioms: ETS team in 2020 metaphor shared task
Xianyang Chen | Chee Wee (Ben) Leong | Michael Flor | Beata Beigman Klebanov
Proceedings of the Second Workshop on Figurative Language Processing

This paper describes the ETS entry to the 2020 Metaphor Detection shared task. Our contribution consists of a sequence of experiments using BERT, starting with a baseline, strengthening it by spell-correcting the TOEFL corpus, followed by a multi-task learning setting, where one of the tasks is the token-level metaphor classification as per the shared task, while the other is meant to provide additional training that we hypothesized to be relevant to the main task. In one case, out-of-domain data manually annotated for metaphor is used for the auxiliary task; in the other case, in-domain data automatically annotated for idioms is used for the auxiliary task. Both multi-task experiments yield promising results.

pdf abs
Automated Evaluation of Writing – 50 Years and Counting
Beata Beigman Klebanov | Nitin Madnani
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In this theme paper, we focus on Automated Writing Evaluation (AWE), using Ellis Page’s seminal 1966 paper to frame the presentation. We discuss some of the current frontiers in the field and offer some thoughts on the emergent uses of this technology.

pdf abs
An Exploratory Study of Argumentative Writing by Young Students: A transformer-based Approach
Debanjan Ghosh | Beata Beigman Klebanov | Yi Song
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

We present a computational exploration of argument critique writing by young students. Middle school students were asked to criticize an argument presented in the prompt, focusing on identifying and explaining the reasoning flaws. This task resembles an established college-level argument critique task. Lexical and discourse features that utilize detailed domain knowledge to identify critiques exist for the college task but do not perform well on the young students’ data. Instead, transformer-based architecture (e.g., BERT) fine-tuned on a large corpus of critique essays from the college task performs much better (over 20% improvement in F1 score). Analysis of the performance of various configurations of the system suggests that while children’s writing does not exhibit the standard discourse structure of an argumentative essay, it does share basic local sequential structures with the more mature writers.

2019

pdf abs
My Turn To Read: An Interleaved E-book Reading Tool for Developing and Struggling Readers
Nitin Madnani | Beata Beigman Klebanov | Anastassia Loukina | Binod Gyawali | Patrick Lange | John Sabatini | Michael Flor
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

Literacy is crucial for functioning in modern society. It underpins everything from educational attainment and employment opportunities to health outcomes. We describe My Turn To Read, an app that uses interleaved reading to help developing and struggling readers improve reading skills while reading for meaning and pleasure. We hypothesize that the longer-term impact of the app will be to help users become better, more confident readers with an increased stamina for extended reading. We describe the technology and present preliminary evidence in support of this hypothesis.

2018

pdf abs
Towards Understanding Text Factors in Oral Reading
Anastassia Loukina | Van Rynald T. Liceralde | Beata Beigman Klebanov
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Using a case study, we show that variation in oral reading rate across passages for professional narrators is consistent across readers and much of it can be explained using features of the texts being read. While text complexity is a poor predictor of the reading rate, a substantial share of variability can be explained by timing and story-based factors with performance reaching r=0.75 for unseen passages and narrator.

pdf abs
A Corpus of Non-Native Written English Annotated for Metaphor
Beata Beigman Klebanov | Chee Wee (Ben) Leong | Michael Flor
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

We present a corpus of 240 argumentative essays written by non-native speakers of English annotated for metaphor. The corpus is made publicly available. We provide benchmark performance of state-of-the-art systems on this new corpus, and explore the relationship between writing proficiency and metaphor use.

pdf abs
Writing Mentor: Self-Regulated Writing Feedback for Struggling Writers
Nitin Madnani | Jill Burstein | Norbert Elliot | Beata Beigman Klebanov | Diane Napolitano | Slava Andreyev | Maxwell Schwartz
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

Writing Mentor is a free Google Docs add-on designed to provide feedback to struggling writers and help them improve their writing in a self-paced and self-regulated fashion. Writing Mentor uses natural language processing (NLP) methods and resources to generate feedback in terms of features that research into post-secondary struggling writers has classified as developmental (Burstein et al., 2016b). These features span many writing sub-constructs (use of sources, claims, and evidence; topic development; coherence; and knowledge of English conventions). Prelimi- nary analysis indicates that users have a largely positive impression of Writing Mentor in terms of usability and potential impact on their writing.

pdf bib abs
Using exemplar responses for training and evaluating automated speech scoring systems
Anastassia Loukina | Klaus Zechner | James Bruno | Beata Beigman Klebanov
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

Automated scoring engines are usually trained and evaluated against human scores and compared to the benchmark of human-human agreement. In this paper we compare the performance of an automated speech scoring engine using two corpora: a corpus of almost 700,000 randomly sampled spoken responses with scores assigned by one or two raters during operational scoring, and a corpus of 16,500 exemplar responses with scores reviewed by multiple expert raters. We show that the choice of corpus used for model evaluation has a major effect on estimates of system performance with r varying between 0.64 and 0.80. Surprisingly, this is not the case for the choice of corpus for model training: when the training corpus is sufficiently large, the systems trained on different corpora showed almost identical performance when evaluated on the same corpus. We show that this effect is consistent across several learning algorithms. We conclude that evaluating the model on a corpus of exemplar responses if one is available provides additional evidence about system validity; at the same time, investing effort into creating a corpus of exemplar responses for model training is unlikely to lead to a substantial gain in model performance.

pdf bib
Proceedings of the Workshop on Figurative Language Processing
Beata Beigman Klebanov | Ekaterina Shutova | Patricia Lichtenstein | Smaranda Muresan | Chee Wee
Proceedings of the Workshop on Figurative Language Processing

pdf abs
Catching Idiomatic Expressions in EFL Essays
Michael Flor | Beata Beigman Klebanov
Proceedings of the Workshop on Figurative Language Processing

This paper presents an exploratory study on large-scale detection of idiomatic expressions in essays written by non-native speakers of English. We describe a computational search procedure for automatic detection of idiom-candidate phrases in essay texts. The study used a corpus of essays written during a standardized examination of English language proficiency. Automatically-flagged candidate expressions were manually annotated for idiomaticity. The study found that idioms are widely used in EFL essays. The study also showed that a search algorithm that accommodates the syntactic and lexical exibility of idioms can increase the recall of idiom instances by 30%, but it also increases the amount of false positives.

pdf abs
A Report on the 2018 VUA Metaphor Detection Shared Task
Chee Wee (Ben) Leong | Beata Beigman Klebanov | Ekaterina Shutova
Proceedings of the Workshop on Figurative Language Processing

As the community working on computational approaches to figurative language is growing and as methods and data become increasingly diverse, it is important to create widely shared empirical knowledge of the level of system performance in a range of contexts, thus facilitating progress in this area. One way of creating such shared knowledge is through benchmarking multiple systems on a common dataset. We report on the shared task on metaphor identification on the VU Amsterdam Metaphor Corpus conducted at the NAACL 2018 Workshop on Figurative Language Processing.

2017

pdf abs
Continuous fluency tracking and the challenges of varying text complexity
Beata Beigman Klebanov | Anastassia Loukina | John Sabatini | Tenaha O’Reilly
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

This paper is a preliminary report on using text complexity measurement in the service of a new educational application. We describe a reading intervention where a child takes turns reading a book aloud with a virtual reading partner. Our ultimate goal is to provide meaningful feedback to the parent or the teacher by continuously tracking the child’s improvement in reading fluency. We show that this would not be a simple endeavor, due to an intricate relationship between text complexity from the point of view of comprehension and reading rate.

pdf abs
Exploring Relationships Between Writing & Broader Outcomes With Automated Writing Evaluation
Jill Burstein | Dan McCaffrey | Beata Beigman Klebanov | Guangming Ling
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

Writing is a challenge, especially for at-risk students who may lack the prerequisite writing skills required to persist in U.S. 4-year postsecondary (college) institutions. Educators teaching postsecondary courses requiring writing could benefit from a better understanding of writing achievement and its role in postsecondary success. In this paper, novel exploratory work examined how automated writing evaluation (AWE) can inform our understanding of the relationship between postsecondary writing skill and broader success outcomes. An exploratory study was conducted using test-taker essays from a standardized writing assessment of postsecondary student learning outcomes. Findings showed that for the essays, AWE features were found to be predictors of broader outcomes measures: college success and learning outcomes measures. Study findings illustrate AWE’s potential to support educational analytics – i.e., relationships between writing skill and broader outcomes – taking a step toward moving AWE beyond writing assessment and instructional use cases.

pdf abs
Detecting Good Arguments in a Non-Topic-Specific Way: An Oxymoron?
Beata Beigman Klebanov | Binod Gyawali | Yi Song
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Automatic identification of good arguments on a controversial topic has applications in civics and education, to name a few. While in the civics context it might be acceptable to create separate models for each topic, in the context of scoring of students’ writing there is a preference for a single model that applies to all responses. Given that good arguments for one topic are likely to be irrelevant for another, is a single model for detecting good arguments a contradiction in terms? We investigate the extent to which it is possible to close the performance gap between topic-specific and across-topics models for identification of good arguments.

We demonstrate a method of improving a seed sentiment lexicon developed on essay data by using a pivot-based paraphrasing system for lexical expansion coupled with sentiment profile enrichment using crowdsourcing. Profile enrichment alone yields up to 15% improvement in the accuracy of the seed lexicon on 3-way sentence-level sentiment polarity classification of essay data. Using lexical expansion in addition to sentiment profiles provides a further 7% improvement in performance. Additional experiments show that the proposed method is also effective with other subjectivity lexicons and in a different domain of application (product reviews).

pdf
Word Association Profiles and their Use for Automated Scoring of Essays
Beata Beigman Klebanov | Michael Flor
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Proceedings of the First Workshop on Metaphor in NLP
Ekaterina Shutova | Beata Beigman Klebanov | Joel Tetreault | Zornitsa Kozareva
Proceedings of the First Workshop on Metaphor in NLP

pdf bib
Argumentation-Relevant Metaphors in Test-Taker Essays
Beata Beigman Klebanov | Michael Flor
Proceedings of the First Workshop on Metaphor in NLP

pdf
Lexical Tightness and Text Complexity
Michael Flor | Beata Beigman Klebanov | Kathleen M. Sheehan
Proceedings of the Workshop on Natural Language Processing for Improving Textual Accessibility

pdf
Associative Texture Is Lost In Translation
Beata Beigman Klebanov | Michael Flor
Proceedings of the Workshop on Discourse in Machine Translation