Paper #1: Reverse-engineering censorship in China: Randomized experimentation and participant observation
Paper #2: Algorithmically Bypassing Censorship on Sina Weibo with Nondeterministic Homophone Substitutions
Reflection #1:
This paper tries to fill the knowledge gap on modeling censorship framework in China. The authors perform a randomized experiment on 100 websites owned by both Chinese government and the private sector to find out how the censorship works. They also do interviews with people as well as the censors themselves to get a better idea about the steps of censorship in China. In posts, the authors focus on 4 cases: posts having collective action plan or not, posts for or against the government. The authors tried to control the language, topic, and timing of the posts as much as possible. From the result it seems like there is a 40% prior probability of a post falling under automatic review. Despite this, sites seem to be more reliant on human actions on censorship as their automatic keyword matching systems don’t perform well on separating different posts. The government puts more constraint on the censorship of collective action like protest while all the other types of posts have an equal probability of being censored. The authors tried to account for all edge cases in their study.
Reflection #2:
This paper uses the reverse-engineered knowledge from previous paper to evade the issue of censorship. The paper introduces a non-deterministic (randomized) algorithm using homophones (apparently words sounding very similar). According to their experiment, the homophones are not easily detectable using the automatic algorithm, while robust to understanding by users. From the cost perspective this add additional 15 hour of human labor per homophones. Although this approach seems to be good, China is already known for an abundance of cheap labor. So even if this adds extra cost to the system, it would only work on systems managed by private entities. The authors use of most frequent homophones seems clever. But it depends on how users would react if more posts are censored due to the usage of all possible combination of censoring words. Given that they have already complied with the current state of censorship, I wouldn’t argue against that.