Synthia: Scalable Grounded Persona Generation from Social Media Data
Vahid Rahimzadeh, Erfan Moosavi Monazzah, Mohammad Taher Pilehvar, Yadollah Yaghoobzadeh
Abstract
Persona-driven simulations are increasingly used in computational social science, yet their validity critically depends on the fidelity of the underlying personas. Constructing virtual populations that are both authentic and scalable remains a central challenge. We introduce Synthia, a persona-generation framework that grounds LLM-generated personas in real social-media posts while delegating narrative construction to language models, using publicly available data from the Bluesky platform. Across multiple social-survey benchmarks, Synthia improves alignment with human opinion distributions over prior state-of-the-art approaches while relying on substantially smaller models. A multi-dimensional fairness and bias analysis shows that Synthia outperforms previous methods for most demographics across different dimensions. Uniquely, Synthia preserves interaction-graph structure among personas grounded in real social network users, enabling network-aware analysis, which we demonstrate through two homophily-focused case studies. Together, these results position Synthia as a practical and reliable framework for constructing scalable, high-fidelity, and equitable virtual populations.- Anthology ID:
- 2026.acl-long.2134
- Volume:
- Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 45983–46006
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.2134/
- DOI:
- Cite (ACL):
- Vahid Rahimzadeh, Erfan Moosavi Monazzah, Mohammad Taher Pilehvar, and Yadollah Yaghoobzadeh. 2026. Synthia: Scalable Grounded Persona Generation from Social Media Data. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 45983–46006, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Synthia: Scalable Grounded Persona Generation from Social Media Data (Rahimzadeh et al., ACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.2134.pdf