Summary 1
Critique 1
I found the paper very well structured and the motivations clearly explained. The authors received non-anonymous data of users on Disqus. Their dataset creation technique using IP addresses, user sessions, and frequency of posting was very interesting. However, it appears like they use some sense of intuition to determine these three factors in identifying sockpuppets. In my opinion, they should have attempted to validate their ground truth externally – possibly using Mechanical Turks. Their results also seem to suggest that there is a primary account and the remaining are secondary. In today’s world of fake news and propaganda propagation, I wonder if accounts are created solely for the purpose of promoting one view. I was equally fascinated by the dichotomy of sockpuppets. In the non-pretenders group, users post different content on different forums. This would mean that the sockpuppets are non-interacting. Why then would a person create two identities?
Summary 2
Following the theme of today’s class, the second paper attempts to identify social spammers in social networking contexts like MySpace and Facebook. They propose a social honeypot framework to lure spammers and record their activity, behavior, and information. A honeypot is a user profile with no activity. On a social network platform, if this honeypot receives unsolicited friend requests (MySpace) or followers (Twitter), it is likely a social spammer. The authors collect information about such candidate spam profiles and build an SVM classifier to differentiate spammers and genuine profiles.
Critique 2
Unlike a traditional machine learning model, the authors opt for a human in the loop model. A set of profiles selected by the classifier are marked to human validators. Based on the feedback from the validators, the model is revised. I think this is a good approach to data collection, validation, and model training. As more feedback is incorporated, the model keeps getting better and encompassing different social spam behaviors. The authors also find an interesting classification of social spammers – more often than not, they attempt to sell pornographic content or enhancement pills, promote their businesses or attempt to phish user details by redirecting people to phishing websites. Since the paper is from 2010, they also use MySpace (a now defunct social network?). It would have been nice to see an analysis of which features stood out in their classification task – however, the authors only presented results of different models.