Aron Culotta

2025

pdf bib abs
Using Text-Based Causal Inference to Disentangle Factors Influencing Online Review Ratings
Linsen Li | Aron Culotta | Nicholas Mattei
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Online reviews provide valuable insights into the perceived quality of facets of a product or service. While aspect-based sentiment analysis has focused on extracting these facets from reviews, there is less work understanding the impact of each aspect on overall perception. This is particularly challenging given correlations among aspects, making it difficult to isolate the effects of each. This paper introduces a methodology based on recent advances in text-based causal analysis, specifically CausalBERT, to disentangle the effect of each factor on overall review ratings. We enhance CausalBERT with three key improvements: temperature scaling for better calibrated treatment assignment estimates; hyperparameter optimization to reduce confound overadjustment; and interpretability methods to characterize discovered confounds. In this work, we treat the textual mentions in reviews as proxies for real-world attributes. We validate our approach on real and semi-synthetic data from over 600K reviews of U.S. K-12 schools. We find that the proposed enhancements result in more reliable estimates, and that perception of school administration and performance on benchmarks are significant drivers of overall school ratings.

2021

pdf bib abs
Enhancing Model Robustness and Fairness with Causality: A Regularization Approach
Zhao Wang | Kai Shu | Aron Culotta
Proceedings of the First Workshop on Causal Inference and NLP

Recent work has raised concerns on the risk of spurious correlations and unintended biases in statistical machine learning models that threaten model robustness and fairness. In this paper, we propose a simple and intuitive regularization approach to integrate causal knowledge during model training and build a robust and fair model by emphasizing causal features and de-emphasizing spurious features. Specifically, we first manually identify causal and spurious features with principles inspired from the counterfactual framework of causal inference. Then, we propose a regularization approach to penalize causal and spurious features separately. By adjusting the strength of the penalty for each type of feature, we build a predictive model that relies more on causal features and less on non-causal features. We conduct experiments to evaluate model robustness and fairness on three datasets with multiple metrics. Empirical results show that the new models built with causal awareness significantly improve model robustness with respect to counterfactual texts and model fairness with respect to sensitive attributes.

2020

pdf bib abs
Identifying Spurious Correlations for Robust Text Classification
Zhao Wang | Aron Culotta
Findings of the Association for Computational Linguistics: EMNLP 2020

The predictions of text classifiers are often driven by spurious correlations – e.g., the term “Spielberg” correlates with positively reviewed movies, even though the term itself does not semantically convey a positive sentiment. In this paper, we propose a method to distinguish spurious and genuine correlations in text classification. We treat this as a supervised classification problem, using features derived from treatment effect estimators to distinguish spurious correlations from “genuine” ones. Due to the generic nature of these features and their small dimensionality, we find that the approach works well even with limited training examples, and that it is possible to transport the word classifier to new domains. Experiments on four datasets (sentiment classification and toxicity detection) suggest that using this approach to inform feature selection also leads to more robust classification, as measured by improved worst-case accuracy on the samples affected by spurious correlations.

2015

pdf bib
Inferring latent attributes of Twitter users with label regularization
Ehsan Mohammady Ardehaly | Aron Culotta
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
Using County Demographics to Infer Attributes of Twitter Users
Ehsan Mohammady | Aron Culotta
Proceedings of the Joint Workshop on Social Dynamics and Personal Attributes in Social Media

2012

Aron Culotta

2025

2021

2020

2015

2014

2012

2007

2006

2004

Co-authors

Venues