Data Pollination: An Emergent Ecological Process Driving AI Population Evolution
Shufang Xie, Qizhi Pei, Ang Lv, Jingyang Hu, Lijun Wu, Rui Yan
Abstract
AI development is often framed as the outcome of isolated research and engineering efforts, yet evidence from deployed systems suggests that language models interact through a shared data ecosystem. While the optimization of individual models is extensively studied, the emergent properties of this interconnected population remain largely unexplored, limiting our ability to predict long-term ecosystem trajectories We term this process data pollination, the unintentional circulation of synthetic model outputs through shared online platforms and web-scale training corpora, and formalize it as a population-based evolutionary framework to investigate stability dynamics under synthetic data training. Our theoretical analysis and controlled experiments involving 320 language models demonstrate that population dynamics can mitigate the model collapse observed in single-lineage recursive training, yielding stable or improving performance across diverse benchmarks. Crucially, we find that ecological diversity functions as a fundamental resilience mechanism that safeguards the ecosystem against collapse, highlighting the critical importance of maintaining model diversity for sustainable AI development.- Anthology ID:
- 2026.acl-long.1229
- Volume:
- Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 26698–26721
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1229/
- DOI:
- Cite (ACL):
- Shufang Xie, Qizhi Pei, Ang Lv, Jingyang Hu, Lijun Wu, and Rui Yan. 2026. Data Pollination: An Emergent Ecological Process Driving AI Population Evolution. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 26698–26721, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Data Pollination: An Emergent Ecological Process Driving AI Population Evolution (Xie et al., ACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1229.pdf