Reflection #10 – [03/22] – [Ashish Baghudana]

Kumar, Srijan, et al. “An army of me: Sockpuppets in online discussion communities.” Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2017.
Lee, Kyumin, James Caverlee, and Steve Webb. “Uncovering social spammers: social honeypots+ machine learning.” Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 2010.

Summary 1

Kumar et al. present a data-driven view of sockpuppets in online discussion forums and social networks. They identify pairs of sockpuppets in online discussion communities hosted by Disqus. Since the data isn’t labeled, the authors devise an automatic technique of identifying sockpuppets based on the frequency of posting and look for multiple logins from the same IP address. In their research, the authors find linguistic differences in the posts of sockpuppets and ordinary users. Primarily, they find that sockpuppets use the first person or second person more often than normal users. They also find that sockpuppets write poorer English and more likely to be downvoted, reported, or deleted. The authors note that there is a dichotomy of sockpuppets – pretenders and non-pretenders. Finally, the authors build a classifier for sockpuppet pairs and determine the predictors of sockpuppetry.

Critique 1

I found the paper very well structured and the motivations clearly explained. The authors received non-anonymous data of users on Disqus. Their dataset creation technique using IP addresses, user sessions, and frequency of posting was very interesting. However, it appears like they use some sense of intuition to determine these three factors in identifying sockpuppets. In my opinion, they should have attempted to validate their ground truth externally – possibly using Mechanical Turks. Their results also seem to suggest that there is a primary account and the remaining are secondary. In today’s world of fake news and propaganda propagation, I wonder if accounts are created solely for the purpose of promoting one view. I was equally fascinated by the dichotomy of sockpuppets. In the non-pretenders group, users post different content on different forums. This would mean that the sockpuppets are non-interacting. Why then would a person create two identities?

Summary 2

Following the theme of today’s class, the second paper attempts to identify social spammers in social networking contexts like MySpace and Facebook. They propose a social honeypot framework to lure spammers and record their activity, behavior, and information. A honeypot is a user profile with no activity. On a social network platform, if this honeypot receives unsolicited friend requests (MySpace) or followers (Twitter), it is likely a social spammer. The authors collect information about such candidate spam profiles and build an SVM classifier to differentiate spammers and genuine profiles.

Critique 2

Unlike a traditional machine learning model, the authors opt for a human in the loop model. A set of profiles selected by the classifier are marked to human validators. Based on the feedback from the validators, the model is revised. I think this is a good approach to data collection, validation, and model training. As more feedback is incorporated, the model keeps getting better and encompassing different social spam behaviors. The authors also find an interesting classification of social spammers – more often than not, they attempt to sell pornographic content or enhancement pills, promote their businesses or attempt to phish user details by redirecting people to phishing websites. Since the paper is from 2010, they also use MySpace (a now defunct social network?). It would have been nice to see an analysis of which features stood out in their classification task – however, the authors only presented results of different models.

Leave a Reply

Your email address will not be published. Required fields are marked *