Zeev Waks


2021

pdf
Classification and Geotemporal Analysis of Quality-of-Life Issues in Tenant Reviews
Adam Haber | Zeev Waks
Findings of the Association for Computational Linguistics: EMNLP 2021

Online tenant reviews of multifamily residential properties present a unique source of information for commercial real estate investing and research. Real estate professionals frequently read tenant reviews to uncover property-related issues that are otherwise difficult to detect, a process that is both biased and time-consuming. Using this as motivation, we asked whether a text classification-based approach can automate the detection of four carefully defined, major quality-of-life issues: severe crime, noise nuisance, pest burden, and parking difficulties. We aggregate 5.5 million tenant reviews from five sources and use two-stage crowdsourced labeling on 0.1% of the data to produce high-quality labels for subsequent text classification. Following fine-tuning of pretrained language models on millions of reviews, we train a multi-label reviews classifier that achieves a mean AUROC of 0.965 on these labels. We next use the model to reveal temporal and spatial patterns among tens of thousands of multifamily properties. Collectively, these results highlight the feasibility of automated analysis of housing trends and investment opportunities using tenant-perspective data.
Search
Co-authors
Venues