Sowmya Vajjala


2021

pdf bib
Teaching NLP outside Linguistics and Computer Science classrooms: Some challenges and some opportunities
Sowmya Vajjala
Proceedings of the Fifth Workshop on Teaching NLP

NLP’s sphere of influence went much beyond computer science research and the development of software applications in the past decade. We see people using NLP methods in a range of academic disciplines from Asian Studies to Clinical Oncology. We also notice the presence of NLP as a module in most of the data science curricula within and outside of regular university setups. These courses are taken by students from very diverse backgrounds. This paper takes a closer look at some issues related to teaching NLP to these diverse audiences based on my classroom experiences, and identifies some challenges the instructors face, particularly when there is no ecosystem of related courses for the students. In this process, it also identifies a few challenge areas for both NLP researchers and tool developers.

2019

pdf bib
On Understanding the Relation between Expert Annotations of Text Readability and Target Reader Comprehension
Sowmya Vajjala | Ivana Lucic
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

Automatic readability assessment aims to ensure that readers read texts that they can comprehend. However, computational models are typically trained on texts created from the perspective of the text writer, not the target reader. There is little experimental research on the relationship between expert annotations of readability, reader’s language proficiency, and different levels of reading comprehension. To address this gap, we conducted a user study in which over a 100 participants read texts of different reading levels and answered questions created to test three forms of comprehension. Our results indicate that more than readability annotation or reader proficiency, it is the type of comprehension question asked that shows differences between reader responses - inferential questions were difficult for users of all levels of proficiency across reading levels. The data collected from this study will be released with this paper, which will, for the first time, provide a collection of 45 reader bench marked texts to evaluate readability assessment systems developed for adult learners of English. It can also potentially be useful for the development of question generation approaches in intelligent tutoring systems research.

pdf bib
Experiments on Non-native Speech Assessment and its Consistency
Ziwei Zhou | Sowmya Vajjala | Seyed Vahid Mirnezami
Proceedings of the 8th Workshop on NLP for Computer Assisted Language Learning

2018

pdf bib
Experiments with Universal CEFR Classification
Sowmya Vajjala | Taraka Rama
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

The Common European Framework of Reference (CEFR) guidelines describe language proficiency of learners on a scale of 6 levels. While the description of CEFR guidelines is generic across languages, the development of automated proficiency classification systems for different languages follow different approaches. In this paper, we explore universal CEFR classification using domain-specific and domain-agnostic, theory-guided as well as data-driven features. We report the results of our preliminary experiments in monolingual, cross-lingual, and multilingual classification with three languages: German, Czech, and Italian. Our results show that both monolingual and multilingual models achieve similar performance, and cross-lingual classification yields lower, but comparable results to monolingual classification.

pdf bib
OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification
Sowmya Vajjala | Ivana Lučić
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

This paper describes the collection and compilation of the OneStopEnglish corpus of texts written at three reading levels, and demonstrates its usefulness for through two applications - automatic readability assessment and automatic text simplification. The corpus consists of 189 texts, each in three versions (567 in total). The corpus is now freely available under a CC by-SA 4.0 license and we hope that it would foster further research on the topics of readability assessment and text simplification.

2017

pdf bib
A study of N-gram and Embedding Representations for Native Language Identification
Sowmya Vajjala | Sagnik Banerjee
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

We report on our experiments with N-gram and embedding based feature representations for Native Language Identification (NLI) as a part of the NLI Shared Task 2017 (team name: NLI-ISU). Our best performing system on the test set for written essays had a macro F1 of 0.8264 and was based on word uni, bi and trigram features. We explored n-grams covering word, character, POS and word-POS mixed representations for this task. For embedding based feature representations, we employed both word and document embeddings. We had a relatively poor performance with all embedding representations compared to n-grams, which could be because of the fact that embeddings capture semantic similarities whereas L1 differences are more stylistic in nature.

pdf bib
A Telugu treebank based on a grammar book
Taraka Rama | Sowmya Vajjala
Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories

2016

pdf bib
Towards grounding computational linguistic approaches to readability: Modeling reader-text interaction for easy and difficult texts
Sowmya Vajjala | Detmar Meurers | Alexander Eitel | Katharina Scheiter
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)

Computational approaches to readability assessment are generally built and evaluated using gold standard corpora labeled by publishers or teachers rather than being grounded in observations about human performance. Considering that both the reading process and the outcome can be observed, there is an empirical wealth that could be used to ground computational analysis of text readability. This will also support explicit readability models connecting text complexity and the reader’s language proficiency to the reading process and outcomes. This paper takes a step in this direction by reporting on an experiment to study how the relation between text complexity and reader’s language proficiency affects the reading process and performance outcomes of readers after reading We modeled the reading process using three eye tracking variables: fixation count, average fixation count, and second pass reading duration. Our models for these variables explained 78.9%, 74% and 67.4% variance, respectively. Performance outcome was modeled through recall and comprehension questions, and these models explained 58.9% and 27.6% of the variance, respectively. While the online models give us a better understanding of the cognitive correlates of reading with text complexity and language proficiency, modeling of the offline measures can be particularly relevant for incorporating user aspects into readability models.

2014

pdf bib
Exploring Measures of “Readability” for Spoken Language: Analyzing linguistic features of subtitles to identify age-specific TV programs
Sowmya Vajjala | Detmar Meurers
Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR)

pdf bib
Automatic CEFR Level Prediction for Estonian Learner Text
Sowmya Vajjala | Kaidi Lõo
Proceedings of the third workshop on NLP for computer-assisted language learning

pdf bib
Assessing the relative reading level of sentence pairs for text simplification
Sowmya Vajjala | Detmar Meurers
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

2013

pdf bib
Role of Morpho-Syntactic Features in Estonian Proficiency Classification
Sowmya Vajjala | Kaidi Lõo
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Combining Shallow and Linguistically Motivated Features in Native Language Identification
Serhiy Bykh | Sowmya Vajjala | Julia Krivanek | Detmar Meurers
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
On The Applicability of Readability Models to Web Texts
Sowmya Vajjala | Detmar Meurers
Proceedings of the Second Workshop on Predicting and Improving Text Readability for Target Reader Populations

2012

pdf bib
On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition
Sowmya Vajjala | Detmar Meurers
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP

pdf bib
The Study of Effect of Length in Morphological Segmentation of Agglutinative Languages
Loganathan Ramasamy | Zdeněk Žabokrtský | Sowmya Vajjala
Proceedings of the First Workshop on Multilingual Modeling

pdf bib
Readability Classification for German using Lexical, Syntactic, and Morphological Features
Julia Hancke | Sowmya Vajjala | Detmar Meurers
Proceedings of COLING 2012