Reflection #12 – [04/05] – [Ashish Baghudana]

April 4, 2018 ashish Leave a comment

Felbo, Bjarke, et al. “Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm.” arXiv preprint arXiv:1708.00524 (2017).

Nguyen, Thin, et al. “Using linguistic and topic analysis to classify sub-groups of online depression communities.” Multimedia tools and applications 76.8 (2017): 10653-10676.

Summary 1

In the first paper, Felbo et al. build an emoji predictor on an extremely large dataset of 1.2 billion tweets and obtain state-of-the-art performance on sentiment, emotion, and sarcasm detection tasks. Since the dataset is not already labeled, the authors use distant supervision as an alternate. The authors demonstrate the success of their DeepMoji model on their dataset and transfer this knowledge to other target tasks. Transfer learning is achieved through a new approach they name “chain-thaw” that fine-tunes a layer at a time. The experiments section shows DeepMoji (with d=1024) achieving a top-5 accuracy of 43.8%, a 5% increase from fasttext’s classification module. The benchmarking experiment also shows DeepMoji (chain-thaw) outperforming state of the art techniques for the specific dataset and task.

Critique 1

The paper does numerous things really well. Firstly, their dataset is huge! This could indeed be one of the reasons for the success of their model. While the approach seems nearly perfect, I would love to know how long training a model on such large a dataset takes. Secondly, they built a website around their model – https://deepmoji.mit.edu/ and I really liked the way users can type in a sentence to obtain emojis associated with it. Interesting to note is that this dataset was obtained before Twitter shifted from 140 characters to 280 characters. The DeepMoji website refuses to process anything over 140 characters. I am not sure if this is a limitation on the front-end, or if the model’s accuracy diminishes beyond this limit. Finally, the paper is definitely more in the machine learning space than the computational social science space, at least in its current form. A good follow-up paper on this would be to use the DeepMoji model to detect bullying or trolls on Twitter (if they are associated more with specific emojis). It is also nice to see the code and the model being open-sourced and easily available for other researchers to use.

Summary 2

In the second paper, Nguyen et al. use linguistic and topic analysis-based features to classify sub-groups of online depression communities. They choose to study the online social media, Live Journal (LJ). LJ is divided into multiple communities and each community has several users posting about topics related to the community. The authors select a final cohort of 24 communities with 38,401 posts which were subsequently grouped into 5 subgroups – depression, bipolar disorder, self-harm, grief/bereavement, and suicide. Their features included LIWC and weights from corpus-topic distribution and topic-word distribution. Using these features, they built 4 different classifiers and found that Lasso performed the best.

Critique 2

I had several problems with this paper. The motivation for the paper was confusing – the authors wish to analyze characteristics of depression, however, they immediately deviate from this problem statement. Subsequently, they categorize five kinds of communities – depression, bipolar disorder, self-harm, grief, and suicide, but do not say why there are five categories, not more or less. The dataset itself collected is small and the authors do not provide any information about how the dataset was labeled. If the authors themselves labeled the communities themselves, it might have introduced bias into the training data, which could have easily been alleviated by using Amazon Mechanical Turks.

From my understanding of the features used, the authors run LIWC analysis to get 68 psycholinguistic features, and subsequently topic distribution for each post. They subsequently run a feature selection technique and show which features were important for four binary classifiers i.e. depression vs. bipolar, self-harm, grief, and suicide. Running feature selection and building four binary classifiers makes it difficult to understand coefficients of the model. The five communities could have been compared better if the authors built a multi-class classifier. Furthermore, I did not understand the semantic meaning of the topics and why they had higher weights for some classifiers without looking at the topics themselves. The authors also do not provide any justification to why they ran LDA with 50 topics. They should have run a perplexity-topics plot to determine the number of topics by the elbow method. Finally, I also did not find any information about their train-test/cross-validation process.

Overall, it feels like this could paper could do with rework on the dataset and more discussion. I was not left with any feel for what constitutes depression vs. self-harm vs. bipolar disorder, and so on.

Reflection #12 – [04/05] – [John Wenskovitch]

April 4, 2018 John Wenskovitch Leave a comment

This pair of papers returns our class discussions to linguistic analyses, including both sentiment detection using emoji (Felbo et al) and classifying online communities (Nguyen et al). The emoji paper (not to be confused with the Emoji Movie) authors build a “DeepMoji” supervised learning model to classify the emotional sentiment conveyed in tweets with embedded emoji. Using an immense multi-billion tweet dataset (that was curated down to just over a billion), the authors build and experiment with their classifier, finding that the rich diversity of emotional labels in the dataset yield performance improvements over previous emotion supervised learning studies. The depression paper examined linguistic features of mental health support communities on Live Journal, seeking to understand some of the relationships present between distinct communities (such as Depression and Suicide groups). In addition to very detailed results, the authors clear discuss their results and the limitations of their study.

The emoji paper was a tad difficult for me to read, in part because it focused so much on the ML approaches used in order to address this emotion sentiment challenge, and in part because I’m just not a person who uses emoji. From my limited understanding, much of their motivation appeared sound. The one thing that I wasn’t certain about was their decision to take tweets with multiple instances of the same emoji and reduce them to a single instance of that emoji. I have seen tweets that use a single cry-smile which are trying to convey a slightly different but still related emotion than tweets that use twelve cry-smiles. In the text communication world, I see it as the difference between “lol” and “hahahahahaha” replies. I’m curious how the performance of their classifier would have changed if they had taken the semantics of multiple emoji into account further.

That said, their dendrogram (Fig 3) showing the clustering of the DeepMoji model prediction contained some interesting relationships between pairs and sets of emoji. For example, the various heart emoji at the right end appear in several different subgroups with a few “bridge” emoji in between to connect those subgroups. That isn’t an outcome that I was expecting. For the most part though, happy emoji were self-contained into their own group, as were clusters that I’ll call sad emoji, celebratory emoji, and silly emoji.

My biggest criticism of the depression paper is the same theme that I’ve been suggesting all semester – getting all of your data from a single source introduces implicit biases into the results that you may not be aware of. In the case of this study, all of the data came from Live Journal communities. Having never belonged to that website, I cannot speak for what features could cause problematic biases. However, I can suggest possibilities like comment moderation as being one dimension that could cause the linguistic features of these communities to differ between Live Journal and other community hubs. Though the authors provided a page of limitations, this was not one of them.

I did also like that the authors compared their Lasso classification with three other classifiers (Naïve Bayes, SVM, and Logistic Regression), and compared their results across all four classifiers. I’m also a big proponent of trying multiple classification techniques and determining which one is working the best (and then going back to the data and trying to understand why).

Reflection 11 – [Aparna Gupta]

March 27, 2018 agupta12 Leave a comment

[1] King, Gary, Jennifer Pan, and Margaret E. Roberts. “Reverse-engineering censorship in China: Randomized experimentation and participant observation.” Science 345.6199 (2014): 1251722.

[2] Hiruncharoenvate, Chaya, Zhiyuan Lin, and Eric Gilbert. “Algorithmically Bypassing Censorship on Sina Weibo with Nondeterministic Homophone Substitutions.” ICWSM. 2015.

Reflection #1

Paper 1 by King et al., have presented an interesting approach to reverse-engineering censorship in China. The experiment performed by the author looks more like a secret operation to analyze how censorship works in China. King et al., created accounts on various social media websites and submitted posts from them to analyze whether they get censored or not. The authors even created their own website and conducted interviews. Their approach was unique and interesting. However, I was not convinced why the authors only considered posts from all over the world between 8 AM and 8PM China time. How about the content being posted before 8 AM and after 8 PM? What I found interesting in the paper is the action hypothesis vs state critique hypothesis. Non-familiarity with the language is a major drawback in understanding it. The authors reported that Chinese social media organizations will hire 50,000 – 70,000 people who will act as human censors which is quite interesting and too less looking at the number of internet users in China.

Reflection #2

Paper 2 by Hiruncharoenvate et al., presents a non-deterministic algorithm for generating homophones that create a large number of false positives for censors. They claim that homophone-transformed weibos posted Sina Weibo remain on site three times longer than their previously censored counterparts. The authors have conducted two experiments – first where they posted original posts and homophone-transformed posts and found that although both the posts eventually were deleted, the homophone-transformed posts stayed 3 times longer. second, they analyze that native Chinese speakers on AMT were able to understand these homophone-transformed weibos. I wonder how this homophone-transformed approach will work in other languages? The dataset used consists of 11 million weibos which was collected from Freeweibo. Out of all the social science papers, we have read so far I found this paper most interesting and their approach well structured. It would be interesting to implement this approach in other languages as well.

Reflection #11 – [03-27] – [Meghendra Singh]

March 27, 2018March 27, 2018 meghs Leave a comment

King, Gary, Jennifer Pan, and Margaret E. Roberts. “Reverse-engineering censorship in China: Randomized experimentation and participant observation.” Science6199 (2014): 1251722.
Hiruncharoenvate, Chaya, Zhiyuan Lin, and Eric Gilbert. “Algorithmically Bypassing Censorship on Sina Weibo with Nondeterministic Homophone Substitutions.” ICWSM. 2015.

The first paper presents a large-scale experimental study of Chinese social media censorship. The authors created accounts on multiple social media sites and submitted various texts, while observing which texts get posted and which get censored. The authors also interviewed employees of a bulletin board software company and other anonymous sources to get a first hand account of the various strategies used by social media websites to censor certain content. This approach is analogous to reverse engineering the censorship system; hence the title of the paper is appropriate. The key hypothesis that this study tries to prove is that of collective action potential, i.e. the target of censorship is people who join together to express themselves collectively, stimulated by someone other than the government, and seem to have the potential to generate collective action in the real-world [How censorship in China allows government criticism but silences collective expression.].

Overall, I find the paper to be an interesting read and Figure 1 gives a nice overview of the various paths a social media post can take on Chinese discussion forums. The authors find that most social media websites used hand-curated keyword matching for automatic review of user posted content. The most interesting fact was that large Chinese social media firms will be hiring 50, 000 to 75, 000 human censors and Chinese Communist Party’s propaganda department, major Chinese news websites and commercial corporations had collectively employed two million “public opinion analysts” (professionals policing public opinion online) as early as 2013 [1]. This implies that for every 309 Internet users in China there was one human censor (There were approximately 618 million Internet users in China in 2013) [2]. With regards to the histogram presented in Figure 4, other than the reasons presented in the paper for the high number of automated reviews on government websites, it may be the case that these websites might be getting a lot more posts then private websites. I believe a large number of posts, would lead to a greater number of posts being selected for automatic review. Additionally, if a person has an issue with a government policy or law, trying to publish their disagreement on a government forum might seem more appropriate to them. Now, given the fact that phrases like: change the law (变法) and disagree (不同意) are blocked from being posted, even on Chinese social media sites, I believe any post showing concern or disagreement with a government policy or law on a government website will be highly likely to be reviewed. Moreover, given the long (power-law like) tailed nature of Chinese social media (as shown in the pie chart below from [King et. al. 2013]), I feel majority of the small private social media websites would be acting as niche communities (e.g., food enthusiasts, fashion, technology, games) and it is unlikely that individuals would post politically sensitive content on such communities.

The second paper discusses an interesting approach to evade censorship mechanisms on Sina Weibo (A popular Chinese microblogging website). The authors cite the decision tree of Chinese censorship from the first paper and highlight the fact that homophone substitution can be used to evade keyword based automatic review and censorship mechanisms. The paper details a non-deterministic algorithm that can generate homophones for sensitive keywords that maybe used to filter microblogs (weibos) for review by censors. The authors prove that the homophone transformation does not lead to a significant change in the interpretability of the post by conducting Mechanical Turk, Human Intelligence Task experiments. The key idea here is that if the censors try to counter the homophone transformation approach by adding all homophones for all blocked keywords to the blocked keyword list, they would end up censoring as much as 20% of the daily posts on Sina Weibo. This would be detrimental for the website as this implies loosing a significant amount of daily post and users (if the users are banned for posting the content). The authors suggest that the only approach which would work to censor homophone transformed posts, while not sabotaging the websites daily traffic would be to employ human censors. This would impose 15 additional human-hours per day worth of effort on the censors for each banned word, which is substantial as there are thousands of banned words.

In Experiment 1, the authors stopped checking status of posts after 48 hours, a question I have is that do all posts ultimately get read by some human censor? If this is the case, is there a justification for the 48-hour threshold to consider a post as uncensored? As the authors suggest in the study limitations, posts by established accounts (specially those having a lot of followers) might be scrutinized (or prioritized for review/censorship) more. It would be interesting to see if there exists a correlation between the number of followers an account has and the time at which their sensitive posts get deleted.

Furthermore, in the results for Experiment 1, the authors specify that there is a statistically significant difference between the publishing rate of the original and transformed posts, in terms of raw numbers, we don’t see a huge difference between the number of original (552) and transformed (576) posts that got published. It would be interesting to repeat Experiment a couple of times to see if these results remain consistent. Additionally, I feel we might be able to apply a Generative adversarial network (GAN) here, with a generator generating different transformations of an original “sensitive” weibo which have high interpretability although can fool the discriminator, the discriminator would act like a censor and decide whether or not the generated weibo should be deleted. Although, I am not sure about the exact architecture of the networks or the availability of sufficient training data for this approach.

Addendum: An interesting list of terms blocked from being posted on Weibo.

Reflection #11 – [03-27] – [Patrick Sullivan]

March 27, 2018August 23, 2018 Patrick Sullivan Leave a comment

“Algorithmically Bypassing Censorship on Sina Weibo with Nondeterministic Homophone Substitutions” by Chaya Hiruncharoenvate et al.
“Reverse-Engineering Censorship in China: Randomized Experimentation and Participant Observation” by Gary King et al.

It seems obvious that as long as massive and automatic censorship is possible to the censor without incurring any major cost, then the censor will remain powerful. However, if the only action the censor can effectively employ is through using human actors, then they should eventually be defeated by any anti-censorship group (except by some extreme response). This is because a larger anti-censor group with the same tools available will be able to focus efforts on overwhelming the censor. Similarly, a small anti-censor group can be innocuous or unassuming where they can focus on remaining undetected. There is another issue for any powerful and growing censor because the increased chances that anti-censor groups will infiltrate and sabotage the censor’s goals. However, a censor can employ computational methods to judge content en masse and with great detail as an effective guard against all of these points.

This leads me to believe that the censorship seen in this research is not sustainable, and is only kept alive through computational methods. The direct way of defeating any censorship is to defeat any machines driving them currently. I think this is the greatest implication of this research by both Hiruncharoenvate et al and King et al. By understanding and breaking down a censor’s computational tools in this manner, a censor would only be able to revert to human censor methods. And when this censorship is unacceptable to people, then they have the strategies I listed above to actually defeat the censor. This is a necessary point to make because without an accompanying anti-censorship movement of people, then defeating the computational tools of the censor is meaningless. So in this case, the computer adversaries are best defeated by computational approaches, and the human adversaries are best defeated by human approaches. I think there should be special consideration taken for problems that match this description because not tackling them in the best way proves to be an incredible waste to time and energy.

I also have trouble estimating whether the censorship is really working as intended. From King’s findings, if China is very concerned about calls for collective action, then it should be surprising that it is less concerned with what could be ‘seeds’ of outrage or activism. China may censor the calls for action by a movement, but they strangely allow the spread of criticism and information that could motivate a movement. This seems problematic because it does not address the underlying concerns of the people, but instead just makes it more difficult to do something in return. Also, the censorship is targeting publicly viewed posts on social media, but doesn’t seem to have any focus on the private direct messages and communication that is being used as well. In the case of a rebellious movement forming, I think this kind of direct and more private communication would naturally come about when a large group has a unifying criticism of the government.

Reflection #11 – [03/27] – [Hamza Manzoor]

March 27, 2018 hamzamanzoor Leave a comment

[1] King, Gary, Jennifer Pan, and Margaret E. Roberts. “Reverse-engineering censorship in China: Randomized experimentation and participant observation.” Science 345.6199 (2014): 1251722.

[2] Hiruncharoenvate, Chaya, Zhiyuan Lin, and Eric Gilbert. “Algorithmically Bypassing Censorship on Sina Weibo with Nondeterministic Homophone Substitutions.” ICWSM. 2015.

Summaries:

In the first paper, King et al. conducted an experiment on censorship in China by creating their own social media websites. They submitted different posts on their social media websites and observed how these were reviewed. The goal of their study was to reverse engineer the censorship process. The results of their study show that the posts that invoke collective actions like protests are censored, whereas, the posts containing criticism of the state and its leaders are published.

In the second paper, Hiruncharoenvate et al. performed experiments to manipulate the keyword based censoring algorithms. They make use of homophones of censored words to get past automated reviews. The authors collected the censored Weibos and developed an algorithm that generates homophones for the censored keywords. The results of their experiments show that that posts with homophones tend to stay 3 times longer and that the native Chinese speakers do not face any trouble deciphering the homophones.

Reflections:

Both these papers use deception to manipulate the “The Great Firewall of China”. The first paper felt like a plot of a movie, where a secret agent invades another country to “rescue” its citizens from a so-called tyrant oppressor. According to me, the research conducted in both of these papers are ethically wrong on so many levels. There is a fine line between illegal and unethical and I think that these papers might have crossed that line. Creating a secret network and providing ways to manipulate the infrastructure created by a government for only its own people is wrong in my opinion. How is it different from the Russian hackers using Facebook to manipulate the election results? Except for the fact that these research papers are in the name of “Free Speech” or “Research”. Had Russians written a research paper “Large-scale experiment on how social media can be used to change users opinions or manipulate elections”, would that justify what they did? NO.

Moving further, one question that I had while reading the first paper was, if they already had access to the software, then, why did they create a social network to see which posts are blocked when the same software was used to block the posts in those social networks in the first place? Or did I understand it wrongly? Secondly, being unfamiliar with the Chinese language, the use homophones in second paper is interesting, and since we have both Chinese speakers presenting tomorrow, it would be nice to know if all the words in Chinese have homophones. Also, is it only in Mandarin or in all Chinese languages? I believe we cannot replicate this research in any other popular language like English or Spanish.

Furthermore, in the second paper, the main idea behind the use of homophones is to deceive the algorithms. The authors claim that the algorithms get deceived due to a different word but the native speakers were able to get the true meaning by looking at the context of the sentence. This makes me wonder that with the new deep learning techniques it is possible to know the context of a sentence and therefore, will this research still work? Secondly, after some time the Chinese government will know that people are using homophones and therefore, feeding homophones to algorithms should not be too difficult.

Finally, it was interesting to see in the first paper that the posts that invoke collective actions like protests are censored, whereas, the posts containing criticism of the state and its leaders are published. So, essentially, the Chinese government is not against criticism but protests. Now, the question of ethics for the other side, is it ethical for governments to block posts? And, how is what the Chinese government doing is different from when other governments crack down on their protestors? Allowing protests and then cracking down on them seems even worse than disallowing protests at all.

Reflection #11 – [03/27] – [Md Momen Bhuiyan]

March 27, 2018March 27, 2018 MD MOMEN BHUIYAN Leave a comment

Paper #1: Reverse-engineering censorship in China: Randomized experimentation and participant observation
Paper #2: Algorithmically Bypassing Censorship on Sina Weibo with Nondeterministic Homophone Substitutions

Reflection #1:
This paper tries to fill the knowledge gap on modeling censorship framework in China. The authors perform a randomized experiment on 100 websites owned by both Chinese government and the private sector to find out how the censorship works. They also do interviews with people as well as the censors themselves to get a better idea about the steps of censorship in China. In posts, the authors focus on 4 cases: posts having collective action plan or not, posts for or against the government. The authors tried to control the language, topic, and timing of the posts as much as possible. From the result it seems like there is a 40% prior probability of a post falling under automatic review. Despite this, sites seem to be more reliant on human actions on censorship as their automatic keyword matching systems don’t perform well on separating different posts. The government puts more constraint on the censorship of collective action like protest while all the other types of posts have an equal probability of being censored. The authors tried to account for all edge cases in their study.

Reflection #2:
This paper uses the reverse-engineered knowledge from previous paper to evade the issue of censorship. The paper introduces a non-deterministic (randomized) algorithm using homophones (apparently words sounding very similar). According to their experiment, the homophones are not easily detectable using the automatic algorithm, while robust to understanding by users. From the cost perspective this add additional 15 hour of human labor per homophones. Although this approach seems to be good, China is already known for an abundance of cheap labor. So even if this adds extra cost to the system, it would only work on systems managed by private entities. The authors use of most frequent homophones seems clever. But it depends on how users would react if more posts are censored due to the usage of all possible combination of censoring words. Given that they have already complied with the current state of censorship, I wouldn’t argue against that.

Reflection #11 – [03/27] – [Jamal A. Khan]

March 27, 2018 jamal93 Leave a comment

Hiruncharoenvate, Chaya, Zhiyuan Lin, and Eric Gilbert. “Algorithmically Bypassing Censorship on Sina Weibo with Nondeterministic Homophone Substitutions.”
King, Gary, Jennifer Pan, and Margaret E. Roberts. “Reverse-engineering censorship in China: Randomized experimentation and participant observation.”

Both of the papers assigned the next class are about Chinese censorship and in a sense have this heroic writing tone in terms of how the idea is being put forward. I didn’t quite like the way the ideas were staged but that is irrelevant and subjective.

Regardless of the tone of the writing or my likes or dislikes at that, I like the first paper’s idea of using the semantics of Chinese language itself as a deterrent against censorship. The complexity of the language has come as a blessing in disguise. Before i get into the critical details of the paper, i would say that the approach of the authors is sound and has been well demonstrated. Therefore, the reflection will focus on what can be done (or undone in my case) using the paper as a base.

Since the title itself states that the purpose of the paper is to “bypass” the censorship, a natural question is “Does this method still work?”. A naive approach to breaking this scheme, or at the very least majorly cutting down the human cost that the authors talk about, would be to build a homophone replacement method. This is very much possible with the recent advances in Word Embedding schemes (referring to works in [1], [2] and especially [3]) and their ability to detect the similarity is usage of words. These embeddings per say do not look at what a word is but rather at how it occurs to deduce the importance, similarity and in case of [3] hierarchy as well when mapping to a arbitrary dimensional vector space (meaning that it could deduce what a random smiley means as well) . Hence to these embedding the homophones are very similar words IF they are used in the same context (which they are!). Since the proposed solution in the paper relies on the reader being able to deduce meaning of the sentences from the context of the article or the situation/news trends, Embeddings will be able to do so well, if not better, and hence the system would censor the posts. So, i guess my point is that, this method might be outdated now, the only overhead the censoring system would have to bear is the training of a new embedding model every day or so.

The other question is “Do the censorship practices still function the same way”. Now that NLP tasks are being dominated by sequence models (deep learning models based on bi-directional RNNs for example), it might be possible to automatically detect even better now. I feel that this question is one that needs further exploration and there is no direct answer.

Another natural question to ask would be: does this homophonic approach extend to other languages as well? For Urdu (Punjabi as well), English and to some extent Arabic, the languages which i know myself, I’m not too sure if such a variety of homophones exists. Since it doesn’t then a straight follow up question is Can we develop language invariant censorship avoidance schemes? I feel that this could be some very exiting work. Maybe some inspiration can be drawn from schemes such as [4].

The second paper by King et al., I must is pretty impressive. The amount of detail in terms of experiment design, the consideration undertaken and the way results are presented is pretty much on point. Now, I’m not to familiar with Chinese censorship and it’s effects, so i can’t make much of the results. The thing that is surprising to me is that posts with collective action potential are banned while those critiquing the government are not, why? Another surprising finding was the absence of a centralized method of censorship and this leads me back to my original question that with newer NLP techniques powered by deep learning emerging, will the censor hammer come down harder? will these digital terminators be more efficient at their job? In the unfortunate case, that this dystopian scenario were to come true, how’re we to deal with it?.

I guess with both the paper combined an ethical question needs to be discussed: Is censorship ethical? if no, then why? if yes, then under what circumstance and to what extent? It would be nice to hear other peoples opinion on this in class.

[1] Efficient Estimation of Word Representations in Vector Space: https://arxiv.org/pdf/1301.3781.pdf

[2] GloVe: Global Vectors for Word Representation: https://nlp.stanford.edu/pubs/glove.pdf

[3] Poincaré Embeddings for Learning Hierarchical Representations: https://arxiv.org/pdf/1705.08039.pdf

[4] Unobservable communication over fully untrusted infrastructure: https://www.cs.utexas.edu/~sebs/papers/pung_osdi16_tr.pdf

Reflection #11 – [03/27] – [Vartan Kesiz-Abnousi]

March 27, 2018March 27, 2018 vartan Leave a comment

[1] Hiruncharoenvate, Chaya, Zhiyuan Lin, and Eric Gilbert. “Algorithmically Bypassing Censorship on Sina Weibo with Nondeterministic Homophone Substitutions.” ICWSM. 2015.

[2] King, Gary, Jennifer Pan, and Margaret E. Roberts. “Reverse-engineering censorship in China: Randomized experimentation and participant observation.” Science 345.6199 (2014): 1251722.

Summary[1]

The paper published by King and colleagues in 2014, researchers did not understand how the censorship apparatus works on sites like SinaWeibo, which is the Chinese version of Twitter. The censored weibos censored weibos were collected for the duration between October 2, 2009-November 20, 2014 and is comprised with approximately 4.4K weibos. The two experiments that the authors use rely on this dataset. Namely, an experiment on Sina Weibo itself and a second experiment where they ask trained users from Amazon Mechanical Turk to recognize the homophones. The second dataset consists of weibos from the public timeline of Sina Weibo, from October 13,2014–November 20,2014, accumulating 11,712,617 weibos.

Venn diagram showing the relationships between homophones (blue circle) and related linguistic concepts

Reflections[1]

I’ve never heard of the term “homophone” before. Apparently, such as decomposition of characters, translation, and creating nicknames—to circumvent adversaries, creating Morphs have been in wide usage. Homophones are a subset of such morphs. The Venn Diagram also provides further insight. Overall, three questions are asked. First, are homophone-transformed posts treated differently from ones that would have otherwise been censored? Second, are homophone-transformed posts understandable by native Chinese speakers? Third, if so, in what rational ways might SinaWeibo’s censorship mechanisms respond? One question that I have is whether utilizing tf-idf score is the best possible choice for their analysis. Why not an LDA? I didn’t find a discussion regarding this choice of the model, even though it is detrimental to the results. The algorithm, as the authors suggest, has a high chance to generate homophones that have no meaning since they did not consult a dictionary. I find this to also have a serious impact in the model. This might look like a detail, but I think it might have been a better idea to keep the Amazon Turk instructions only in Mandarin, instead of asking in English that non-Chinese speakers not to complete the task. It would have been helpful if we had all the parameters of the logit model in a table. Regardless, they find that

Questions[1]

Is the usage of homophones particularly that widespread in mandarin, compared to the Indo-European Language Family? Furthermore, can these methods be applied to other languages?
How much of a complex conversation can occur with the usage of homophones? Is there language complexity metric, with complexity defined as some metric that conveys ideas effectively?
An extension of the study could be the study of the linguistic features of posts containing homophones.

Summary [2]

The paper written by King et al has two parts. First, they create accounts on numerous Chinese social media sites. Then, they randomly submit different text and observe which texts are censored and which weren’t. The second task involves the establishment of a social media site, that uses Chinese media’s censorship technologies. Their goal is to reverse engineer the censorship process. Their results support a hypothesis, where criticism of the state, its leaders, and their policies are published, whereas posts about real-world events with collective action potential are censored.

Reflections [2]

This is an excellent paper in terms of causal inference and the entire structure. Gary King is a notorious author in experimental design studies aimed to draw causal inference. For the experimental part, first they introduce blocking based on the writing style. I didn’t find much about the writing style on the supplemental material. They also have a double dichotomy, that produces four experimental conditions: pro- or anti-government and with or without collective action potential. It is the randomization that allows to draw causal claims in the study.

Questions [2]

How do they measure the “writing style”, when they introduce blocking?

Reflection #11 – 03/27 – Pratik Anand

March 27, 2018 pratik Leave a comment

Paper 1 : Reverse-engineering censorship in China: Randomized experimentation and participant observation

Paper 2 : Algorithmically Bypassing Censorship on Sina Weibo with Nondeterministic Homophone Substitutions

Both the papers play two parts of a larger process – observation and counter-action.
The first paper deals with the reasearchers trying to understand how Chinese censoeship works. In the light of lack of any official documentation, they use experimentally controlled anecdotal evidences to build an hypthesis of functioning of “The Great Firewall of China“. They use some well-known topics with history of censorship and post discussions on various Chinese forums. They had some interesting observations like corruption cases and criticism as well as praise of goverment are more heavily censored than sensitive topics like Tibet and border disputes. This could represent the priorities of the government towards censorship and can help to bypass it for majority of the sensitive cases.
Non-familiarity with the language creates difficulty in understanding the nuances of the language used in surviving posts and banned posts. Though, mostly it seems like that the censorship is primarily dependent on keyword matching with exernal techniques as subsidiaries, will global advancement in NLP research might be harmful than useful in this case ? With the help of advanced NLP, censorship tools can go beyond just words and infer context from the statments.
This brings to the second paper on bypassing censorship. The authors make use of homophone substitutions to fool auto-censorship tools. Language is again a barrier here in fully grasping the effects of homophones substitutions. However, it can be inferred that a limited number of substitutions are possible for every word to be replaced. This creates a problem if those substitutions become popular. The censorship tools can easily ban those. A common recent example is the removal of the 2-term restriction on Presidentship in China. The people started criticising it using mathematical terminlogy of 1,2,……N to represent infinite numbers. Messages like “Congratulating Xi Jinping for getting selected as President for the Nth time”. The censor tools went ahead and not only recognised the context of this very subtle joke but also blocked the letter N for some time. Hence, it shows that no matter how robust and covert a system is, if it gains enough traction, it will come into focus and gets banned. There is a need ot find ways which cannot be countered even after they have been exposed.