Paper 1 : An Army of Me: Sockpuppets in Online Discussion Communities
Paper 2 : Uncovering Social Spammers: Social Honeypots + Machine Learning
The first paper is of special interest to me. It deals with sockpuppets and fake accounts on social media and forums. Being an active user, I see a lot of this in action. The paper identifies sockpuppets accounts are those which are maintained by a single person, referred to as puppeteer, who uses this account to either promote/denounce a certain viewpoint or cause overall dissent without the consequences of getting affected by account bans by moderators.
The authors acknowledge that it is very hard to get ground truth data for this matter. So they use observational studies to get insights into sockpuppet behavior. Could the techniques which are used for ground truth on spam messages be applied for sockpuppets ? In my opinion, a list of banned users in a social forum is a good way to get started.
Some of the traits observed are :
1. Sockpuppets accounts are created quite early on by the users
2. They usually post at the same time and on the same topics.
3. They rarely start a discussion but usually participate in discussions.
4. The discussion topics are always very controversial.
5. They write very simialr content
The authors also observed that the sockpuppets are treated harshly by the community. Is it due to their behavior or just a side-effect of the fact that they only participate in posts about controversial topics? Not all sockpuppets are malicious. The question of pretenders vs non-pretenders was very intriguing. Some people keep a sock puppet for entirely comical/other purposes and I don’t believe that the authors’ method of classifying them based on username is effective enough.
This is because many non-pretenders may keep multiple sockpuppet account based around a joke which will fail to be classified as non-pretending account by the authors’ method.
The authors have provided a case where two sockpuppets, by the same pupetmaster, argue against each other. They justify this behavior as their means to increase traffic to the given post. I am not sure if that is the reason. They didn’t provide a way to identify if those sockpuppets are indeed handled by the same person. There is also a possibility of a group of people maintaining certain sockpuppet accounts. This will make their patterns everchanging and also provide alternate reasoning to the argument point raised above.
The second paper deals with creating honeypots to learn about spam account traits and using it for spam classifier. The authors do a good job of explainaing how social spam is different from email spam as it has a touch of personalized message which is a more effective startegy for luring users. Though this paper doesn’t go into details of how they setup the honeypots, they share the observations from analysing the spammers who got into the honeypot. The honeypots were created on MySpace and Twitter and spammer behavior vary different in both cases. The authors note that MySpace is more of a long form of social communication platform. Thus, they identify “About Me” section as the most important part of a spammer profile which can be used in classification. They make an asusmption that it won’t change radically as it is like a sales pitch and thus, the spam classifiers will be able to detect them. I believe this is a limitation of the technique. About Me can be changed as easily as any other section. It is indeed important but replacing it will be like replacing one sales pitch with another. Hence, that justification doesn’t hold up.
The paper details that the authors’ created MySpace social profile with geographic location pertaining to every state in the USA. What was the reasoning behind this ? Do different geographic locations provide a level of genuineness which these honeypots profile require?
Lastly, can a reverse technique be used by the spamemrs to identify honeypots profiles and take safeguards against them ?