Sophia Chan


AGReE: A system for generating Automated Grammar Reading Exercises
Sophia Chan | Swapna Somasundaran | Debanjan Ghosh | Mengxuan Zhao
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We describe the AGReE system, which takes user-submitted passages as input and automatically generates grammar practice exercises that can be completed while reading. Multiple-choice practice items are generated for a variety of different grammar constructs: punctuation, articles, conjunctions, pronouns, prepositions, verbs, and nouns. We also conducted a large-scale human evaluation with around 4,500 multiple-choice practice items. We notice for 95% of items, a majority of raters out of five were able to identify the correct answer, for 85% of cases, raters agree that there is only one correct answer among the choices. Finally, the error analysis shows that raters made the most mistakes for punctuation and conjunctions.


pdf bib
Social and Emotional Correlates of Capitalization on Twitter
Sophia Chan | Alona Fyshe
Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media

Social media text is replete with unusual capitalization patterns. We posit that capitalizing a token like THIS performs two expressive functions: it marks a person socially, and marks certain parts of an utterance as more salient than others. Focusing on gender and sentiment, we illustrate using a corpus of tweets that capitalization appears in more negative than positive contexts, and is used more by females compared to males. Yet we find that both genders use capitalization in a similar way when expressing sentiment.


Ensemble Methods for Native Language Identification
Sophia Chan | Maryam Honari Jahromi | Benjamin Benetti | Aazim Lakhani | Alona Fyshe
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

Our team—Uvic-NLP—explored and evaluated a variety of lexical features for Native Language Identification (NLI) within the framework of ensemble methods. Using a subset of the highest performing features, we train Support Vector Machines (SVM) and Fully Connected Neural Networks (FCNN) as base classifiers, and test different methods for combining their outputs. Restricting our scope to the closed essay track in the NLI Shared Task 2017, we find that our best SVM ensemble achieves an F1 score of 0.8730 on the test set.