Linguistic Characteristics of Censorable Language on SinaWeibo

Kei Yin Ng, Anna Feldman, Jing Peng, Chris Leberknight


Abstract
This paper investigates censorship from a linguistic perspective. We collect a corpus of censored and uncensored posts on a number of topics, build a classifier that predicts censorship decisions independent of discussion topics. Our investigation reveals that the strongest linguistic indicator of censored content of our corpus is its readability.
Anthology ID:
W18-4202
Volume:
Proceedings of the First Workshop on Natural Language Processing for Internet Freedom
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Venues:
COLING | NLP4IF | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12–22
Language:
URL:
https://aclanthology.org/W18-4202
DOI:
Bibkey:
Cite (ACL):
Kei Yin Ng, Anna Feldman, Jing Peng, and Chris Leberknight. 2018. Linguistic Characteristics of Censorable Language on SinaWeibo. In Proceedings of the First Workshop on Natural Language Processing for Internet Freedom, pages 12–22, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Linguistic Characteristics of Censorable Language on SinaWeibo (Ng et al., 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/W18-4202.pdf