Reflection #11 – [03/27] – [John Wenskovitch]

This pair of papers falls under the topic of censorship in Chinese social media.  King et al.’s “Reverse-Engineering Censorship” article takes an interesting approach towards evaluating censorship experimentally.  Their first stage was to create accounts on a variety of social media sites (100 total) and sent messages worldwide to see which messages were censored and which were untouched.  Accompanying this analysis are interviews with confidential sources, as well as the creating of their own social media site by contracting with Chinese firms and then reverse-engineering their software.  Using their own site gave the authors the ability to understand more about posts that are reviewed & censored and accounts that are permanently blocks, which could not be done through typical observational studies.  In contrast, the “Algorithmically Bypassing Censorship” paper, the authors make use of homophones of censored keywords in order to get around detection by keyword matching censorship algorithms.  Their process, a non-deterministic algorithm, still allows native speakers to recover the meaning behind almost all of the original untransformed posts, while also allowing the transformed posts to exist 3x longer than their censored counterparts.

Regarding the “Reverse-Engineering” paper, one decision in their first stage that I was puzzled by was the decision to submit all posts between 8AM and 8PM China time.  While it wasn’t the specific goal of their research, submitting some after-hours posts could generate interesting information about just how active the censorship process is in the middle of the night.  That includes all of the potential branches – censored after post, censored after being held for review, and accounts blocked.

From their results, I’m not sure which part surprised me more:  that 63% of submissions that go into review are censored, or that 37% that go into review are not censored and eventually get posted.  I guess I need more experience with Chinese censorship before settling on a final feeling.  It seems reasonable that automated review will capture a fair number of innocuous posts that will later be approved, but 37% feels like a high number.  Their note that a variety of technologies are used in this automated review process would imply high variability in the accuracy of the automated review system, and so a large number of ineffective solutions could explain why 37% of submissions are released for publication after review.  On the other hand, the authors chose to make a number of posts about hot-button (“collective action”) issues, which is the source of my surprise regarding the 63% number.  Initially I would have expected a higher number, because despite the fact that the authors submit both pro- and anti-government posts, I would suspect that additional censorship might be added in order to un-hot-button these issues.  Again, I need more experience with Chinese social media to get a better feeling of the results.

Regarding the “Algorithmically Bypassing” paper, I really enjoyed the methodology of taking an idea that activists are already using to evade censorship and automating it to use at scale by more users.  Without being particularly familiar with Mandarin, I suspect that creating such a solution is easier in China than it would be in a language like English with fewer homophones.  However, it did remind me of the images that are shared frequently on Facebook that are something like “fi yuo cna raed tihs yuo aer ni teh tpo 5% inteligance” (generally seen with better scrambled letters in longer words, in which the first and last letters are kept in the correct position).

I felt that the authors’ stated result that posts typically live 3x longer than an untransformed equivalent censored post was impressive until I saw the distribution in Figure 4.  A majority of the posts do appear to have survived with that 3x longer time statistic.  However, the relationship is much more prevalent for surviving 3 hours rather than 1, while many fewer posts exist in the part of the curve where a post survives for 15 hours rather than 5.  A case of giving a result that is accurate but also a bit misleading.

Read More

Reflection #11 – [03/27] – [Ashish Baghudana]

King, Gary, Jennifer Pan, and Margaret E. Roberts. “Reverse-engineering censorship in China: Randomized experimentation and participant observation.” Science 345.6199 (2014): 1251722.
Hiruncharoenvate, Chaya, Zhiyuan Lin, and Eric Gilbert. “Algorithmically Bypassing Censorship on Sina Weibo with Nondeterministic Homophone Substitutions.” ICWSM. 2015.

Summary 1

King et al. conducted a large-scale experimental study of censorship in China by creating their own social media websites, submitting different posts, and observing how these were reviewed and/or censored. They obtained technical know-how in the use of automatic censorship software from the support services of the hosting company. Based on user guides, documentation, and personal interviews, the authors deduced that most social media websites in China conduct an automatic review through keyword matching. The keywords are generally hand-curated. They reverse-engineer the keyword list by posting their own content and observing which posts get through. Finally, the authors find that posts that invoke collective action are censored as compared to criticisms of the government or its leaders.

Reflection 1

King et al. conduct fascinating research in the censorship domain. (The paper felt as much a covert spy operation as research work). The most interesting observation from the paper is that posts about collective action are censored, but not those that criticise the government. This is labeled the collection action hypothesis vs. state critique hypothesis. This means two things – negative posts about the government are not filtered, and positive posts about the government can get filtered. The paper also finds that automated reviews are not very useful. The authors observe a few edge cases too – posts about corruption, wrongdoing, senior leaders in the government (however innocuous their actions might be), and sensitive topics such as Tibet are automatically censored. These may not bring about any collective action, either online or offline, but are still deemed censor-worthy. The paper makes the claim that certain topics are censored irrespective of whether they are for or against the topic.

I came across another paper by the same set of authors from 2017 – King, Gary, Jennifer Pan, and Margaret E. Roberts. “How the Chinese government fabricates social media posts for strategic distraction, not engaged argument.” American Political Science Review 111.3 (2017): 484-501. If censorship is one side of the coin, then bots and sockpuppets constitute the other. It would not be too difficult to imagine “official” posts by the Chinese government that favor their point of view and distract the community from more relevant issues.

The paper threw open several interesting questions. Firstly, is there punishment for writing posts that go against the country policy? Secondly, the Internet infrastructure in China must be enormous. From a systems scale, do they ensure each and every post goes through their censorship system?

Summary 2

The second paper by Hiruncharoenvate et al. carries the idea of keyword-based censoring forward. They base their paper on the observation that activists have employed homophones of censored words to get past automated reviews. The authors develop a non-deterministic algorithm that generates homophones for the censored keywords. They suggest that homophone transformations would cost Sina Weibo an additional 15 hours per keyword per day. They also find that posts with homophones tend to stay 3 times longer on the site on average. The authors also tie up the paper by demonstrating that native Chinese readers did not face any confusion while reading the homophones – i.e. they were able to decipher their true meaning.

Reflection 2

Of all the papers we have read for the Computational Social Science, I found this paper to be the most engaging, and I liked the treatment of their motivations, design of experiments, results, and discussions. However, I also felt disconnected because of the language barrier. I feel natural language processing tasks in Mandarin can be very different from that in English. Therefore, I was intrigued by the choice of algorithms (tf-idf) that the authors use to obtain censored keywords, and then further downstream processing. I am curious to hear how the structure of Mandarin influences NLP tasks from a native Chinese speaker!

I liked Experiment 2 and the questions in the AMT task. The questions were unbiased and actually evaluate if the person understood which words were mutated.

However, the paper also raised other research questions. Given the publication of this algorithm, how easy is it to reverse engineer the homophone generation and block posts that contain the homophones as well? They keyword-matching algorithm could be tweaked just a little to add homophones to the list, and checking if several of these homophones occurred together or with other banned keywords.

Finally, I am also interested in the definitions of free speech and how they are implemented across different countries. I am unable to draw the line between promoting free speech and respecting the sovereignty of a nation and I am open to hearing perspectives from the class about these issues.

Read More

Reflection #10 – [03/22] – [Nuo Ma]

  1. Kumar, Srijan, et al. “An army of me: Sockpuppets in online discussion communities.” Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2017.

Summary:

Sockpuppets created by unreal users may mislead or have an negative impact in online discussion communities. In this paper, the author present a study about sockpuppets in online discussion communities. The data they use comes from nine online discussion communities and consisted of 2.9 million users. The author first identify sockpuppets by using features like similar names and same IP addresses posted within close time proximity. Then they analyzed the posting behaviors and linguistic features of such sockpuppets. As a result, the author was able to find the behavior of sockpuppets is different from that of ordinary users. Including tendency to write more posts than ordinary users, and shorter posts with a lot of first person pronouns.

 

Reflection:
I like this paper, and there are some points to be discussed. In the data selection, I can see that on average per puppy master owns 2 puppy accounts. This is oddly consistent across all 9 communities and might there be a reason why? And one level deeper into this question, What’s the motivation of such sockpuppets.  As we can see in the figure 6, those topics (usa, world,politics,justice,opinion) have significant higher amount of sockpuppets compared to other topics. So I think it’s safe to assume that people use sockpuppets mainly for political-oriented discussions. But in the data statistics, we can see that the number of sockpuppets in political sites compared to the MLB and allkpop is still average 2 accounts per puppy master, and not even a significant difference in the ratio of #sockpuppets vs #users. Or what is a better way to verify the results of such detection methods.  There can also be multiple purposes of such sockpuppets. It can be from PR companies to give positive comments on a celebrity, on a certain event or for a product.  In fact this is really common practice in some regions, the image of a celebrity can greatly affect the revenue of related movies and product. But I’d say it’s almost impossible to get dataset from these PR companies who own a large number of such sockpuppets.

Read More

Reflection #10 – [03/22] – [Md Momen Bhuiyan]

Paper #1: Uncovering Social Spammers: Social Honeypots + Machine Learning
Paper #2: An Army of Me: Sockpuppets in Online Discussion Communities

Reflection #1:
In this paper authors used honeypots, a type of harmless bots, to collect social spammer information from two early social network sites, Twitter and MySpace. The authors provided a clear motivation for the paper. According to authors email spammers and social spammers are very different. The framework introduced in this paper keeps the precision high for detection of spams. It was deployed in the early stage of the social network sites which probably made it easier to use the specific attributes from the user profile information to automatically detect social spammers. In the recent times, it seems social spammers have already been repurposed for new tasks like introducing misinformation in the network. One of the interesting thing in the paper was using different ensemble based classifier which I didn’t exactly understand. The authors introduced spam precision metric to evaluate their classifier in the wild but didn’t say how it was different from precision metric.

Reflection #2:
Sockpuppets are duplicate accounts to deceive users in a social discussion forum. Authors in this paper looked into 9 online forums to analyze attributes of sockpuppets. The first problem I see with the data collection is using IP addresses to detect sock-puppets. Although authors tried to filter IP addresses that are using NAT, it is not clear if that was effective assuming very few people join discussion forums. The authors did use prior work to characterize other parameters for their framework. Authors found several attributes of the sockpuppets are different from other spammers like bots, trolls etc. Sockpuppets tend to participate in a discussion about controversial topics. They are also more likely to be downvoted. The author used entropy to characterize usage pattern of different sockpuppets but it is not clear if that measurement works. The author used ego network to analyze user-user interaction and found that sockpuppets have higher PageRank in the network. The main success of the paper seems to be finding sockpuppets once one pair has been detected.

Read More

Reflection #10 – [03/22] – [Jamal A. Khan]

Both of the papers revolve around the theme of fake profile, albeit of different types.

  1. Kumar, Srijan, et al. “An army of me: Sockpuppets in online discussion communities.” Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2017.
  2. Lee, Kyumin, James Caverlee, and Steve Webb. “Uncovering social spammers: social honeypots+ machine learning.” Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 2010.

The first paper about sockpuppets is well written and pretty well explained throughout. However, the motivation of the paper seems weak! i see that there are ~ 3300 socketpuppets out of a total of ~2.3 million users! So that bring me to my question that is that even worthy enough problem to tackle? Do we need an automated classification model?  why do socketpuppets need to be studied? Do they have harmful effects that need to be mitigated?
Moving forward, the entirety of the paper builds up to a classifier and though not a bad thing, I get a feeling that the work was conducted top down (idea of a classifier for sockpuppets -> features needed to build it) but the writing is bottom up (need to study sockpuppets and the use of the generated material to make a classifier). Regardless, the study does raise some follow up questions, some of which seem pretty interesting to check out:

  • Why do people make sock puppets? What purpose are the puppets are being used for? are they created for a targeted objective or are they more troll like (just for fun? or just because someone can?)
  • How do puppeteers differ from ordinary users?
  • Can community influence the creation of sockpuppets? I realize the paper already partially answer this question, but i think there needs to be much more dialed down attention on temporal effects of community on, and the behavior of, a puppeteer before the creation of the puppets)

Coming onto the classifier, I have a few grievances. Like many other papers that we have discussed in class, the paper lacks the details of the classifier model used e.g. number of tress in the ensemble, the max tree depth, what’s the voting strategy: do all tress get the same vote count?. However, I will give the authors credit because this is the first paper among the ones we’ve read that has used a powerful classifier as opposed to simple logistic regression. Still, the model has poor predictive power. Since, i’m working on an NLP classification problem i’m wondering if the sequential models might work better.

 

Moving onto the second paper, it’s a great idea but executed poorly and so, I apologize for the harsh critique in advance. The idea of honeypot profiles is intriguing but just like how social features can be used to sniff out spammer profiles, they can used to sniff out honeypots and hence, the trap can be avoided. So I think the paper’s approach is naive in the sense that it needed more work on why and how the social honeypots are more robust to changes in strategies by the spammers.

Regardless, the good thing about the project is the actual deployment of the honeypots. However, the main promise of being able to  classify the spammer and non-spammers has not been delivered. The scale of the training dataset is minuscule and not representative i.e. there are only 627 deceptive and 388 legtimate profiles for the MySpace classification task. Hence, the validity of the following table becomes questionable.

With the  dataset of the same scale as the one being used here, we could’ve also fitted a multinational regression and perhaps gotten similar results. The choices of the classifiers has not been motivated and why have so many been tested?. It seems the paper has fallen victim to the approach that “when you have a hammer everything looks like a nail”. The same story is repeated with the twitter classification task.

Regardless of my critique, the paper presents the classifier results in more detail than most papers, so that’s a plus. It was quite interesting to see that age in figure 2 (show below) had a big ROC. So, my question is that are the spammer profiles younger than legitimate user profiles?

Another, question is regarding the test of time for the study, will a similar classifier perform well  on the MySpace of today (i.e. Facebook)? Since, the user base now is probably much different and diverse now, the traits of legitimate users have changed.

Finally, I would like other people’s opinion on the last portion of the paper i.e. “in-the-wild” testing. I think this last section is just plain wrong and misleading at best. The authors say that

“… the traditional classification metrics presented in the previous section would be infeasible to apply in this case. Rather than hand label millions of profiles, we adopted the spam precision metric to evaluate the quality of spam predictions. For spam precision, we evaluate only the predicted spammers (i.e., the profiles that the classifier labels as spam). …”

Correct me if I’m wrong please but the spam precision metric proposed is measuring the true positive rate for the profiles that were actually classified as spam and not the ones that were actually spam. This is misleading because it ignores the profiles that weren’t detected in the first place and so for all we know, the true-negatives may have been orders of magnitude larger than the detected negatives. For Example, suppose in actuality we had 100000 spam profile in 500000 overall profiles, out of which only 5000 were detected. The authors are only reporting how many of the 5000 were actually true positives and  not how many of the 100000 were true positive. There is no shortcut to research and i think the reason cited above in italics is simply a poor excuse to avoid a repetitive and time consuming task. In the past few years, good data is what has driven Machine Learning to it’s current popularity and so to make a claim using ML, the data needs to be to a great degree unquestionable. It’s for the same reason that most companies don’t mind releasing their deep learning model’s architecture because they know that without the data that the company had, no one will be able to reproduce similar results. Therefore, to me all the results in section 5 are bogus and irrelevant at best. Again I apologize for the harsh critique.

Read More

Reflection #10 – [03/22] – [Meghendra Singh]

  1. Kumar, Srijan, et al. “An army of me: Sockpuppets in online discussion communities.” Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2017.
  2. Lee, Kyumin, James Caverlee, and Steve Webb. “Uncovering social spammers: social honeypots+ machine learning.” Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 2010.

In the first paper, Kumar et. al. study “sockpuppets” in nine discussion communities (a majority of these are news websites).  The study shows that sockpuppets behave differently than ordinary users, and these differences can be used to identify them. Sockpuppet user accounts have specific posting behavior, they don’t usually start discussions, their posts are generally short and contain certain linguistic markers (For e.g., greater use of personal pronouns such as “I”).  The authors begin by automatically labeling 3665 users as sockpuppets in the 9 discussion communities using their IP addresses and user session data. The authors identify two types of sockpuppets: pretenders (pretending to be legitimate users) and non-pretenders (easily identifiable as fake users, i.e. sockpuppets, by the discussion community). The two main results of the paper are classification of sockpuppet user pairs from ordinary user pairs (ROC AUC=0.91) and predicting if an individual user is a sockpuppet (ROC AUC=0.68), using activity, community and linguistic (post) features. The paper does a good job of explaining the behavior of sockpuppets in the comments section of articles in typical news websites and how these behaviors can be used to detect sockpuppets and thereby lead to maintaining healthy and unbiased online discussion communities. The paper references a lot of prior work and I really appreciate the fact that most of the decisions about features, parameters and other assumptions made in the study, are grounded in past literature. While reading the paper, a fundamental question that came to my mind was, if we can already identify a sockpuppets using IP addresses and temporal features of their comments, what is the point of using predictive modeling to differentiate sockpuppets from ordinary users? In essence, if we already have a high precision, rule-based approach to detect sockpuppets why rely on predictive modeling that performs a little better than random chance (ROC AUC=0.68)?

I found the sockpuppet-ordinary user conversation example at the end of section 2 really funny, and I feel that the first comment itself is rather suspicious. This example also seems to indicate that the puppetmaster (S2) is the author of the article on which these comments are being posted. This leads to the question that given a puppetmaster has multiple sockpuppet accounts will their main account be considered an ordinary user? If not, does this mean that some of the articles themselves are being written by sockpuppets? A research question in this context can be: “detecting news articles written by sockpuppets in popular news websites”. Another question I had was why did the authors use cosine similarity between feature vectors of users? And what are the statistics for this metric (mean and standard deviation of cosine similarities between sockpuppet and ordinary user feature vectors). Additionally, is there a possibility of using a bag of words model here, instead of numeric features like LIWC and ARI computed from user’s posts? Moreover, there is a potential to experiment with other classification techniques here and see if they can perform better than Random Forest.

Lastly, as the authors suggest in discussion and conclusion, it would be interesting to repeat this experiment on big social platforms like Facebook and Twitter. This becomes really important in today’s world, where online social communities are rife with armies of sockpuppets, spambots and astroturfers, hell-bent on manipulating public opinion, scamming innocent users and enforcing censorship.

The second paper by Lee et. al. addresses a related problem of detecting Spammers on MySpace and Twitter using Social Honeypots and classifiers. The study presents an elegant infrastructure for capturing potential spammer profiles, extracting features from these profiles and training popular classifiers for detecting spammers with high accuracy and low FPR. The most interesting finding for me were the most discriminative features (i.e., About Me Text and Number of URLs per tweet) for classifying spammers from legitimate users and the fact that ensemble classifiers (Decorate, etc.) performed the best. Given that deep learning was not really popular in 2010, it would be interesting to apply state of the art deep learning technique for the classification problem discussed in this paper. As we have already seen that the discriminative features that separate spammers from regular users vary from one platform/domain to other, it would be interesting to see if there exist common cross-platform, cross-domain (universal) features that are equivalently discriminative. Although, MySpace may not be dead, it would be interesting to redo this study on Instagram which is a lot more popular now, and has a very real spammer problem. Based on personal experience, I have observed legitimate users on Instagram becoming spammers once they have enough followers. Will a social honeypot based approach work for detecting such users? Another challenge with detecting spam (or spammers) on a platform like Instagram is that most of the spam is in the form of stories (posts which automatically disappear in 24 hours), while the profiles may look completely ordinary.

Read More

Reflection #10 – [03/22] – [Vartan Kesiz-Abnousi]

Topic: Bots & Sock puppets

Definitions:
Sockpuppets: “fake persona used to discuss or comment on oneself or one’s work, particularly in an online discussion group or the comments section of a blog” [3]. Paper [1] defines it  as “a sockpuppet as a user account that is controlled by an individual (or puppetmaster) who controls atleast one other user account.” They [1] also use the term “sockpuppet group/pair to refer to all the sockpuppets controlled by a single puppetmaster”.
BotsInternet Bot, also known as web robot, WWW robot or simply bot, is a software application that runs automated tasks (scripts) over the Internet.” [4]
Summary [1]
The authors [1] study the behavior of sockpuppers. The research goal is to identifying, characterizing, and
predicting sockpuppetry. This study [1] spans across nine discussion communities. They demonstrate that sockpuppets differ from ordinary users in terms of their posting behavior, linguistic traits, as well as social network structure. Moreover, they use the IP addresses and user session data and identify 3,656 sockpuppets comprising 1,623 sockpuppet groups, where a group of sockpuppets is controlled by a single puppetmaster. For instance, when studying “avclub.com”, the authors find that sockpuppets  tend to interact with other sockpuppets, and are more central in the network than ordinary users.  Their findings suggest a dichotomy in the deceptiveness of sockpuppets:  some  are pretenders, that masqueradeas separate users, while others are non-pretenders, that is sockpuppets that are overtly visible to other members of the community. Furthermore, they find that deceptiveness is only important when sockpuppets are trying to create an illusion of public consensus. Finally, they create a model to automatically identifying sockpuppetry.
Reflections [1]
Of the 9 nine discussion communities that were studied, their is a heterogeneity with respect to: a) the “genre”, b) the number of users and c) the percentage of sock-puppets. While these are interesting cases to study, none of them are “discussion forums”. Their main function as websites, and business model, is not to be a discussion platform. This has several ramifications. For instance, “ordinary” users, and possibly moderators, who are participating in such websites might find it more harder to identify “sock-puppetry”, because they can not observe their long-term behavior, as they could in a “discussion forum”.
Their analysis focuses on sockpuppets groups that consist of two sockpuppets. However, neither sockpuppets groups that consist of three or even four sockpuppets are not neglible. What if these sockpuppets demonstrate a different pattern? What if a multitude of sockpuppets of 3, 4 and beyond,  is more likely to engange in systematic propaganda? This is a hypothesis that would be interesting to explore.
I also believe that we can draw some parallels from this paper with another paper that we reviewed in this class regarding “Antisocial Behavior in Online Discussion Communities” [5]. For instance, their definitions are different regarding the definition of “threads” etc. As a matter of fact, two of the authors in both papers are the same, [1] Justin Cheng and Jure Leskovec. Furthermore, in both papers they use “Disqus”, which is the commenting platform that hosted these discussions. Would the results generalize in something else than “Disqus”? This, I believe, remains a central question.
The “matching” by utilizing the propensity score is questionable. The propensity score is a good matching measure only when we account/control for all the factors i.e. we know the “true” propensity score. This does not happen in the real world. It might be a better idea to add “fixed-effects” and restrict the matches to a specific time wedge, i.e. match observations within the same week to control for seasonal effects.  The fact that the dataset is “balanced” after the matching does not consist of evidence that the matching was done correctly.  It is the features they used for matching  (i.e. “the similar numbers of posts and make posts to the same set of discussions) that should be balanced, not the “dataset”. They should have had at least a QQ plot that shows the ex-ante and ex-post matching performance. A poor matching procedure will result into bad inputs into their subsequent machine learning model, in this case random forest. Note that the authors performed the exact same matching procedure in their previous 2015 paper [5]. Apparently nobody pointed this out.
Questions [1]
 
[Q1] I am curious as to why the authors decided take the following action: “We also do not consider user accounts that post from many different IP addresses, since they have high chance of sharing IP address with other accounts“. I am not sure whether I understand their justification. Is their research that backs up this hypothesis? No reference is provided.
In general, remove outliers, for the sake of removing outliers, is not a good idea. Outliers are removed usually when a researcher believes when a specific portion of the data is an wrong data entry i.e. a housing price of $0.
[Q2] A possible extension would be to explore the relationship beyond sockpuppets groups that consist of only two sockpuppets.
[Q3] There is no guarantee that the matching was done properly, as I analyze in the reflection.
Summary [2]
The authors propose and evaluate a novel honeypot-based approach for uncovering social spammers in online social systems. The authors define social honeypots as information system resources that monitor spammers’ behaviors and log their information. The authors propose a method to automatically harvest spam profiles from social networking communities, the development robust statistical user models for distinguishing between social spammers and legitimate users and filtering out unknown  (including zero-day) spammers based on these user models. The data is drawn from two communities, MySpace and Twitter.
Reflections [2]
While I was reading the article, I was thinking of the IMDB ratings. I have observed that there have been a lot of movies, usually controversial, that receive ratings only ratings that in the extremes of the rating scale, either “1” or “10”. Or in some other cases, movies are rated, even though they have still not been publicly released. Which fraction of that would be considered a “social spam” though? Is a mobilization of an organized group that is meant to down-vote a movie a “social spam” [6]?
Regardless, I think it is very important to make sure ordinary users are not classified as spammers, since this could have a cost on the social networking site, including their public image. This means that their should be an acceptable “false positive rate”, tied to the trade-off between between having spammers and penalizing ordinary users, a concept known in mathematical finance as “Value at risk (VaR)”.
Something that we should stress is that in the MySpace random sample, the profiles have to be public and the about me information has to be valid. I found the interpretation of the “AboutMe” feature as the best predictor given by the authors very interesting. As they argue, it is the most difficult feature for a spammer to vary because it contains the actual sales pitch or deceptive content that is meant to target legitimate users
Questions [2]
[Q1] How would image recognition features perform as predictors?
[Q2] Should an organized group of ordinary people who espouse an agenda be treated as “social spammers”?
References
[1] Kumar, Srijan, et al. “An army of me: Sockpuppets in online discussion communities.” Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2017.
[2] Lee, Kyumin, James Caverlee, and Steve Webb. “Uncovering social spammers: social honeypots+ machine learning.” Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 2010.
[5] CHENG, J.; DANESCU-NICULESCU-MIZIL, C.; LESKOVEC, J.. Antisocial Behavior in Online Discussion Communities. International AAAI Conference on Web and Social Media, North America, apr. 2015. Available at: <https://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/view/10469>.

Read More

Reflection #10 – [03/22] – [Hamza Manzoor]

[1]. Kumar, Srijan, et al. “An army of me: Sockpuppets in online discussion communities.” Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2017.

Summary:

In this paper, Kumar et al. present a study on sockpuppets in online discussion communities. They perform their study on nine different online discussion communities and their data consisted of around 2.9 million users. Authors use multiple logins from the same IP to identify sockpuppets and evaluate the posting behavior and linguistic features of sockpuppets to build a classifier. The authors find that the ordinary users and sockpuppets use different languages and sockpuppets tend to write more posts than ordinary users (699 vs. 19). Sockpuppets also use more first person and second person singular personal pronouns and are more likely to be down-voted, reported, or deleted. The authors also explain the types of sockpuppets as “pretenders and non-pretenders” and “supporters vs. dissenters”.

Reflection:

I really enjoyed reading this paper because the study they performed was very different especially the labels creation using IP addresses and user sessions. The authors received non-anonymized data from Disqus, which makes me question that is it legal for Disqus to share non-anonymized data?

Some of the findings in the study were very astonishing such as email addresses and usernames of sockpuppets are more similar. First, I do not like the approach to classify sockpuppets based on username and second, I am having hard time believing that sockpuppets have similar usernames and emails. Do they mean that all usernames and emails of one puppet master similar to one another? Or are they similar to other sockpuppets?  

Their findings also show that 30% of sockpuppets are non-supporters, 60% are supporters and only 10% are dissenters. It would have been interesting find that sockpuppets support each other on what kind of topics? Do we see more supporters on political topics? Or are there sockpuppets belonging to one puppet master which are supporters on one left leaning topics and dissenters on right leaning topics and vice versa? If yes, can we claim that does some specific party pay someone to create these sockpuppets?

Read More

Reflection #10 – [03/22] – [Jiameng Pu]

An Army of Me: Sockpuppets in Online Discussion Communities

Summary:

People interact with each other on the Internet mainly by discussing mechanism provided by social networks such as Facebook, Reddit. However, sockpuppets created by malicious users badly influence the network environment by engaging in undesired behavior like deceiving others or manipulating discussions. Srijan et al. study sockpuppetry across nine discussion communities. By firstly identify sockpuppets using multiple signals indicating accounts might share the same user, they then characterize their behavior by inspecting different aspects. They find that the behavior of sockpuppets is different from that of ordinary users in many ways, e.g., start fewer discussions, write shorter posts, use more personal pronouns such as “I”. The study contributes towards the automatic detection of sockpuppets by presenting a data-driven view of deception in online communities.

Reflection:

For the process of identifying sockpuppets, the strategy is inspired by Wikipedia administrators who identify sockpuppets by finding accounts that make similar edits on the same Wikipedia article in near-similar time and from same IP address, which makes sense. But for the hyper-parameter, top percentage(5%) of most used IP address, is there any better strategy that can decide the percentage more numerically rather than intuitively? When measuring linguistic traits of sockpuppets, LIWC word categories is used to measure the fraction of each type of words written in all posts, and VADER for sentiment of posts. Up to now, I feel LIWC word categories is powerful and heavily used in social science research, I’ve never used VADER before. In the double life experiment, although they match sockpuppets with ordinary users that have similar posting activity, and that participate in similar discussion, I feel like there is too much uncertainty in the linguistic feature of ordinary users, i.e., different users have different writing style. Then the cosine similarity of the feature vectors for each account would be less convincing.

Uncovering Social Spammers: Social Honeypots + Machine Learning

Summary:

Both web-based social networks (e.g., Facebook, MySpace) and online social media sites (e.g., YouTube, Flickr) rely on their users as primary contributors of content, which made them prime targets of social spammers. Social spammers engage themselves in undesirable behavior like phishing attacks, to disseminate malware and commercial spam messages, etc, which will seriously impact the user’s experience. Kyumin et al. propose a honeypot-based approach for uncovering social spammers in online social systems by harvesting deceptive spam profiles from social networking communities and creating spam classifiers to actively filter out existing and new spammers. The machine learning based classifier is able to identify previously unknown spammers with high precision and a low rate of false positives.

Reflection:

The section of machine learning based classifier impressed me a lot, since it shows how to investigate the discrimination power of our individual classification features apart from only evaluating the effectiveness of classifiers, in which ROC curve plays an important role. Also, AMContent, the text-based features modeling user-contributed content in the “About Me” section, shows me how to use more complicated text feature besides simple data like age, marital status, gender. I’ve never heard of Myspace before but there is still twitter experiment, otherwise I would think this is a weird choice of experiment dataset. For twitter spam classification, we can obviously see the differences in the way they collect account feature, i.e., twitter accounts are noted for their short posts, activity-related features, and limited self-reported user demographics. Thus there is a reminder that feature design varies according to the variation of study subjects.

 

Read More

Reflection #10 – [03/22] – [John Wenskovitch]

This pair of papers describes aspects of those who ruin the Internet for the rest of us.  Kumar’s “An Army of Me” paper discusses the characteristics of sockpuppets in online discussion communities (as an aside, the term “sockpuppet” never really clicked for me until seeing its connection with “puppetmaster” in the introduction of this paper).  Looking at nine different discussion communities, the authors evaluate the posting behavior, linguistic features, and social network structure of sockpuppets, eventually using those characteristics to build a classifier which achieved moderate success in identifying sockpuppet accounts.  Lee’s “Uncovering Social Spammers” paper uses a honeypot technique to identify social spammers (spam accounts on social networks).  They deploy their honeypots on both MySpace and Twitter, capturing information about social spammer profiles in order to understand their characteristics, using some similar characteristics as Kumar’s paper (social network structure and posting behavior).  These authors also build classifiers for both MySpace and Twitter using the features that they uncovered with their honeypots.

Given the discussion that we had previously when reading the Facebook papers, the first thing that jumped out at me when reading through the results of the “Army of Me” paper was the small effect sizes, especially in the linguistics traits subsection.  Again, these included strong p-values of p<0.001 in many cases, but also showed minute differences in the rates of using words like “I” (0.076 vs 0.074) and “you” (0.017 vs 0.015).  Though the authors don’t specifically call out their effect sizes, they do provide the means for each class and should be applauded for that.  (They also reminded me to leave a note in my midterm report to discuss effect sizes.)

One limitation of “Army of Me” that was not discussed was the fact that all nine communities that they evaluated use Disqus as a commenting platform.  While this made it easier for the authors to acquire their (anonymized) data for this study, there may be safety checks or other mechanisms built into Disqus that bias the characteristics of sockpuppets that appear on that platform.  Some of their proposed future work, such as studying the Facebook and 4chan communities, might have made their results stronger.

“Army of Me” also reminded me of the drama from several years ago around the reddit user unidan, the “excited biologist,” who was banned from the community for vote manipulation.  He used sockpuppet accounts to upvote his own posts and downvote other responses, thereby inflating his own reputation on the site.

Besides identifying MySpace as a “growing community” in 2010, I thought that the “Uncovering Social Spammers” paper was a mostly solid and concise piece of research.  The use of a human-in-the-loop approach to obtain human validation of spam candidates to improve the SVM classifier appealed to the human-in-the-loop researcher in me.  Some of the findings from their honeypot data acquisition were interesting, such as the fact that Midwesterners are popular spamming targets and that California is a popular profile location.  I’m wondering if the fact that these patterns were seen is indicative of some bias in the data collection (is the social honeypot technique biased towards picking up spammers from California?), or if there actually is a trend in spam accounts to pick California as a profile location.  This wasn’t particular clear to me; instead, it was just stated and then ignored.

I really liked their use of both MySpace and Twitter, as the two different social networks enabled the collection of different features (e.g., F-F ratio for Twitter, number of friends for MySpace) in order to show that the classifier can work on multiple datasets.  It’s almost midnight and I haven’t slept enough this month, but I’m still puzzled by the confusion matrix that they presented in Table 1.  Did they intend to leave variables in that table?  If so, it doesn’t really add much to the paper, as they’re just describing the standard definitions of precision, recall, and false positive.  They don’t present any other confusion matrices in the paper, so it seems even more out of place.

Read More