Abstract
This paper investigates censorship from a linguistic perspective. We collect a corpus of censored and uncensored posts on a number of topics, build a classifier that predicts censorship decisions independent of discussion topics. Our investigation reveals that the strongest linguistic indicator of censored content of our corpus is its readability.- Anthology ID:
- W18-4202
- Volume:
- Proceedings of the First Workshop on Natural Language Processing for Internet Freedom
- Month:
- August
- Year:
- 2018
- Address:
- Santa Fe, New Mexico, USA
- Venue:
- NLP4IF
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 12–22
- Language:
- URL:
- https://aclanthology.org/W18-4202
- DOI:
- Cite (ACL):
- Kei Yin Ng, Anna Feldman, Jing Peng, and Chris Leberknight. 2018. Linguistic Characteristics of Censorable Language on SinaWeibo. In Proceedings of the First Workshop on Natural Language Processing for Internet Freedom, pages 12–22, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
- Cite (Informal):
- Linguistic Characteristics of Censorable Language on SinaWeibo (Ng et al., NLP4IF 2018)
- PDF:
- https://preview.aclanthology.org/auto-file-uploads/W18-4202.pdf