Reflection #4 – [1/30] – Pratik Anand

Paper 1 : The Promise and Peril of Real-time Corrections to Political Misconceptions
Paper 2 : A Parsimonious Language Model of Social Media Credibility Across Disparate Events

Both papers deal with user opinions on information accessed on the Internet, especially news articles. The first paper shows that the individuals exposed to real-time correction to the news articles they are reading, are less likely to be influenced by the corrections if it goes against their belief system. On the other hand, it is effective if the correction goes with their beliefs. The paper goes into details into explaining that more emotionally invested a user in the topic of an article, less likely he/she is going to be swayed by the correction whether it is real-time or delayed. Users get more defensive in such cases especially if the counterargument is provided in real-time. The acceptance rate of counterargument is a little better if they are presented after a delay to the user. I can understand that a delay can provide the reader with time to introspect his/her beliefs on the topics and may raise curiosity for counterarguments. However, will all kind of delays have that effect ? Human attention span is pretty low and it may happen when the correction is introduced to the user, the user may not remember the original article but only its summarized judgments and the correction may not have a strong effect on it. I liked the part of the paper where they discussed how users only keep summary judgment of attitude and beliefs and discard the evidences. When presented with counterarguments, they decide whether the new idea is persuasive enough to update old beliefs, paying little attention to the facts provided by the article or its source. I think that the readers will trust a counterargument to deeply held beliefs more if it comes from sources which they already trust. Similar observation has been made in paper which adds that the framing of the counterargument is also very important. A counterargument which is respectable of the reader’s beliefs will have a greater impact.

The second paper introduces the factor of the language model used in credibility of a news articles. Certain words reduce credibility and other increase credibility. Such research is quite important, however, it doesn’t take into account external factors like the personal beliefs and attitude of the reader, his/her political and social context etc. Even though paper makes a case of certain words of assertiveness which cause increased credibility, it cannot be generalized. For some people, assertive opinions which are go against their beliefs might appear less credible than opinions which shows ambiguity and reasoning of both sides of the arguments. A good direction of research from the second paper should be inclusion of diversity factors into credibility account. Different factors like gender, race, economic and social status, age etc. could vary the results.

An example of different result would be a research on similar topic was done on reddit community changemyview. In changemyview, people ask to be provided with challenging views to their deeply held beliefs. The paper concluded that in most of the cases, language and the tone of the counterargument played a bigger role than quantity or quality of the facts. A counterargument which causes the reader to introspect and arrive on the same conclusion as the counterargument will be highly influential in changing reader’s view. It also makes a point that only those people who are ready to listen to counterarguments participate in that community. Hence, the people who decide to stay in their echo chamber are not evaluated in the study.

Read More

Reflection #4 – [1/30] – Jamal A. Khan

In this paper the questions posed by the authors are quite thought provoking and are ones which, i believe, most of us in computer science would simply overlook, mostly perhaps due to lack of knowledge in psychology. The main question is what sort of corrective approach would lead people to let go of political misconception? is it an immediate correction? or is it a correction provided after some delay? or perhaps no corrections are the way to go? Given the fact that this is a computational social science course, i found this paper, though interesting, a bit out of place. The methodologies, questions and the way results and discussion are built up, make it quite inline to be psychology paper.

Nevertheless, coming back to the paper, I felt that two aspects completely ignored during analysis were gender and age:

  • How do men and women react to same stimuli of correction?
  • Are younger people more open to differing opinions and/or are young people more objective at judging the evidence countering their misperceptions?

It would’ve been great to see how the results extrapolate among different racial groups. Though, i guess that’s a bit unreasonable to expect given that ~87% of the sample population comprises of one racial group. This highlighted snippet from the paper made a chuckle:

There’s nothing wrong  in making claims but one should have the numbers to back it up, which doesn’t seem to be the case here.

The second question that arose in my mind comes from personal experience is whether the results of this study would hold true in, say, Pakistan or India? The reason I ask this question is that politics there is driven by different factors such as religion and what not, so the behavior of people and their tendency to stick to a certain views regardless of evidence flouting them would be different.

The third point, would be the the relationship of the aforementioned concerns and level of education of the subject:

  • Do more educated have the ability to push aside their preconceived notions and point of views when presented with corrective evidence?
  • How is level of education co-related with the notion of whether a media/new outlet is biased or not and the ability to ignore that notion?

Before moving onto the results and discussions, i have a few concerns about how some of the data was collected from the participants. In particular:

  • They have a 1-7 scale for some questions, 1 being the worst and 7 being the best case. How do people even put themselves on such a scale that has no reference point? Given that was no reference point, or at least none that authors mentioned, any answer given by the participants to such questions will be ambiguous and highly biased relative to what they consider to be a 1 or 7 on the scale. Hence results drawn from such questions would be misleading at best.
  • The second concern has more to do with the the timing assigned to the reading? why 1 minute or even 2 at that? why was it not considered mandatory for the participants to read the entire document/piece of text? what motivated this method and what merits does it have if any.
  • MTurk was launched publicly on November 2, 2005 and the paper published in 2013. Was it not possible to gather more data using remote participants.

Now the results section managed to pull all sorts of triggers for me so i’m not going to get into details of them and just pose three questions:

  • Graphs with unlabelled y-axis? though i don’t doubt the authenticity or intentions of the authors but this makes the results so much less credible for me.
  • Supposing the y-axis are in 0-1 range, why are all the threshold at 0.5?
  • Why linear regression? won’t that force all results to be artifacts of the fit rather than actual trends? Logistic regression or Regression Trees i believe would have been a better choice without sacrificing interpretability.

Now the results drawn are quite interesting e.g. One of thefindings that I didn’t expect was that real-time correction don’t actually provoke heightened counter arguments but that the problem comes into play via biases stemming from prior attitudes when comparing credibility of the claims. So the question then arises, how do we correct people when they’re wrong about ideas they feel strongly about and when the strength of their belief might dominate the ability to reason? In this regard, I like the first recommendation made by the authors of presenting news from sources that the users trust, which these days can be easily extracted from their browsing history or even input form the users themselves. Given the fact that extraction of such information form users is already common place i.e. google and Facebook use to place ads, I  we needn’t worry about privacy. What we do need to worry about is as the authors mentioned it’s tendency to backfire and reinforce the mispercetptions. So, then the question transforms to how do we not make this customization tool a two-edged-sword? one idea is that maybe we could provide users a scale of how left or right leaning the source is when presenting the information and tailor the list of sources to include more neutral ones or tailor the raking to make sources seem neutral and occasionally sprinkle in the opposite leaning sources as well to spice things up.

I would like to close off by saying that, we as computer scientist do have the ability, far above many, to build tools that shape society and it falls upon to us to understand the populace and human behavior and the peoples’ psychology much more deeply than already we do. If we don’t then we run into the danger of the tools generating results contrary to what they had been designed to.  As Uncle Ben put it With great power comes great responsibility“.

Read More

Reflection #4 – [01/30] – [Jiameng Pu]

  • A parsimonious language model of social media credibility across disparate events

Summary:

With social media’s dominance of people’s acquisition of news and events, the credibility of content on different platforms tend to be less rigorous than that of content on traditional journalists. In this paper, the author conducts research on analyzing the credibility of content on social media and proposes a parsimonious model that can map language cues to perceived levels of credibility. The model is presented based on examining the credibility corpus of Twitter messages corresponding to 15 theoretically grounded linguistic dimensions. It turns out that there are considerable indicators of events’ credibility in the language people use.

Reflection:

The paper inspires me a lot by leading me to go through a whole research process from idea creation to idea implementation. In the data preparation phase, it’s a very common task to annotate with ordinal values on the content we are studying, the proportion-based ordinal scale PCA was used in this paper can help compromise extreme conditions, which is like a trick that I can take away and try in other studies. The author uses logistic regression as the classifier, I think neural networks should also be a good choice to make classifications. Specifically, neural networks used for classifying usually has inputs of feature number and outputs of class number. Potentially, neural networks might help us derive better-classifying performance, which then helps the analysis of feature contribution.

  • The Promise and Peril of Real-Time Corrections to Political Misperceptions

Summary:

Computer scientists create real-time correction systems to label inaccurate political information with the purpose of warning users of inaccurate information. However, the author is skeptical about the efficiency of the real-time correction strategy and leads an experiment to compare the effects of real-time correction to non-real-time correction. In the design phase of the comparative experiment, the researchers conduct a control variable method and assess participants’ perceptions of facts through questionnaires. Then they construct a linear regression model to analyze the relationship between belief accuracy of participates and the correction strategy. The paper concludes that real-time corrections are modestly more effective only among individuals predisposed to reject the false claim.

Reflections:

This paper conducts a comparative study about effects of immediate and delayed corrections on readers’ belief accuracy. Generally, one of the most important parts of comparative research is to design a reasonable and feasible experimental scheme. Although lots of big data research needs to collect ready-made data for preprocessing, some research requires researchers themselves to “produce” data. Thus the design scheme of getting data has a significant impact on subsequent experiments and analysis.
In the first survey-based step for data collection, choice of survey samples, setting of control variables, and evaluation methods are main points that can greatly affect the experimental results, such as the diversity of participants (race, gender, age), the design of delayed correction, and the design of the questionnaire.

Particularly, in order to achieve the delay correction, the author employed a distraction task—participates were asked to complete a three-minute image-comparison task. Although this task can achieve the purpose desired, this is not the only strategy we can perform. For example, the duration of the distraction task may have a different impact on the participants’ cognition of the news facts, so researchers can try multiple durations to observe whether there is different impact. In analyzing section, linear regression is one of the most common models used in result analysis. However, for some complex issues without a strict rule, the error of a linear regression model is potentially larger than that of a nonlinear regression model. Nonlinear regression with appropriate regularization is also an option to choose.

Question:

  1. As analyzed in the limitation section, although the author tries to make the best possible experimental design, there are still many design decisions that affect the experimental results. How can we do this to minimize the error?
  2. Intuitively, it is more proper to study mainstream reading group on the Internet, the average age of the study object is too large under this circumstances.

Read More

Reflection #4 – [1/30] – [Meghendra Singh]

  1. Garrett, R. Kelly, and Brian E. Weeks. “The promise and peril of real-time corrections to political misperceptions.” Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 2013.
  2. Mitra, Tanushree, Graham P. Wright, and Eric Gilbert. “A parsimonious language model of social media credibility across disparate events.” Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. ACM, 2017.

Both the papers focus on issues surrounding credibility of information available on the world wide web and provide directions for future research in the subject matter. Garrett and Weeks focus on the implications of correcting inaccurate information in real-time versus presenting the corrected information after a delay (after a distractor task). Here, the authors used a between-participants experiment to compare participant beliefs on the issue of electronic health records (EHRs), when a news article about EHRs is presented with corrections as opposed to when it is presented with delayed corrections. The study was conducted on 574 demographically diverse U.S. based participants. On the other hand, Mitra et. al., present a model for assessing the credibility of social media events, which was trained using linguistic and control features present in 1377 event streams (66M twitter posts) of the CREDBANK corpus. In this case the authors first use Mechanical Turkers to score the credibility of individual event streams and subsequently train a penalized logistic regression (using LASSO regularization) to predict the ordinal credibility level (Low, Medium or High) of the event streams.

In their paper, Garrett and Weeks explore a subtle yet interesting issue of real-time correction of information leading to individuals rejecting carefully documented evidence and forming a distrust for the source. The paper seems to suggest that people who are predisposed to a certain ideology are more likely to question and not trust any real-time corrections to information on the internet (articles, blogposts, etc.) that go against there ideology. Whereas, people who already have doubts about the information are more likely to agree with the corrections. Upon reading the initial sections of the paper I felt that delayed corrections to information available online might not be really useful. I say this because people rarely revisit an article which they have read in the past. If the corrections are not presented as the readers are going through the information, how will they ultimately receive the corrected information? It is highly unlikely that they will revisit the same article in the future?  I also feel that the study might be prone to sample bias since the attitudes, biases and predispositions of people in the U.S. may not reflect those of another geography. Additionally, as the authors also mention in the limitation of the study, we might get different results if the particular issue that was being analyzed is changed (e.g. we might get different results if the issue was anti-vaccination?).

In the second paper, Mitra et. al. focused on predicting the “perceived” credibility of social media event reportage using linguistic and non-linguistic features. Although the approach is interesting, I feel that there can be a difference between the perceived an actual credibility of an event. For example, given that Mitra et. al., have published that Subjectivity in the original tweets is a good predictor of credibility, malicious twitter users, wanting to spread misinformation, might artificially incorporate language features that improve the Subjectivity of their tweets so that, they seem more credible? A system based on the model presented in the paper would likely assign high perceived credibility to the tweets spreading misinformation in this case? A research question might be, to come up with a model that can detect and compensate for such malicious cases? Another interesting question might be to devise a system that can measure and present users with an events’ “actual” credibility (maybe using crowdsourcing or dependable journalistic channels?) instead of the “perceived” credibility based on language markers in the tweets about the event?

Another, question I have is why the authors use the specific form of Pca (i.e. why were the +1 or “Maybe Accurate” ratings not used for computing Pca?). Also, there are 66M tweets in CREDBANK, given that these are clustered into 1377 event streams, there should be roughly 47K tweets in each event stream (assuming an even distribution). Did each of the 30 Turkers (who were rating an event) read through the 47K tweets or were these divided between the Turkers? Although, I do agree with the authors that this study circumvents the problem of sampling bias as it analyzes a comprehensive collection of a large set of social media events, I feel there is a fair chance of “Turker bias” creeping into the model (in Table 2, we generally see a majority of Turkers rating the events as [+2] i.e. Certainly Accurate? I am curious, was there a group of Turkers who always rated any event stream presented to them as “Certainly Accurate”?)

 

 

Read More

Reflection #4 – [1/29] – Ashish Baghudana

Garrett, R. Kelly, and Brian E. Weeks. “The promise and peril of real-time corrections to political misperceptions.” Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 2013.

Reflection

In this paper, the authors pose two interesting research questions – can political misperceptions and inaccuracies be corrected by a fact-checker, and if so, does real-time correction work better than delayed correction on users? The authors run an experiment where they first read an accurate post about EHRs followed by an inaccurate post from a political blog. The readers were divided into three groups where they were presented with corrections in the inaccurate post:

  • immediately after reading a false post
  • after a distraction activity
  • never

Garrett et al. report that corrections can be effective, even on politically charged topics. Based on the questionnaire at the end of the experiment, the authors concluded that users who were presented with corrections were more accurate on their knowledge on EHRs in general. Specifically, immediate-correction users were more accurate than delayed-correction. However, immediate correction also accentuated the attitudinal bias of these users. People who viewed the issue negatively had an increase in resistance to correction.

This paper is unlike any of the papers we have read in this class till now. In many senses, I feel this paper deals entirely with psychology. While it is applicable to computer scientists in designing fact-checking tools, it has more far-reaching effects. The authors created separate material for each group in their experiment and physically administered the experiment for each of their users. This research paper is a demonstration of meticulous planning and execution.

An immediate question from the paper is – would this experiment be possible using Amazon Mechanical Turk (mTurk)? This would have helped collect more data easily. It would also enable the authors to run multiple experiments with different cases – i.e. more contentious issues than EHRs. The authors mention that second (factually incorrect) article was associated with a popular political blog. If the political blog was right-leaning or left-leaning and this was known to the users, did it affect their ratings in the questionnaire? The authors could have kept an intermediate survey (after stage 1) to understand their prior biases.

A limitation that the authors mention is that of reinforcement of corrections. Unfortunately, running experiments involving humans is a massive exercise, and it would be difficult to repeat this several times. Another issue with these experiments is that users are likely to treat the questionnaire as a memory test and answer based on that, rather than their true beliefs. I also had a contention with the racial diversity in the sample population. The population is majorly white (~86%).

This study can be extended to study the correlation between party affiliation and political views with the willingness of the user for correction. Are certain groups of people more prone to incorrect beliefs?

Read More

Reflection #3 – [1/23] – [Nuo Ma]

Cheng, Justin, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. “Antisocial Behavior in Online Discussion Communities.” ICWSM. 2015.

Summary:

In this paper, Cheng et al present a study of users banned for antisocial behaviors sampled from  three online communities (CNN, IGN and Breitbart). The author characterize antisocial behaviors by study specific groups from these online communities: FBUs(Future-Banned Users) and NBUs(Never-Banned Users). The author also presented analysis of “evolution over time” indicating FBUs write worse than other users over time and the community tolerance tend to decline over time. At last, the author proposed an approach to extract features for predicting antisocial behaviors, potentially automate and standardize this process.

Reflection:

I think there are several points noteworthy in this paper. First, these are three communities with different characteristics. Breitbart is far-right according to google, IGN don’t have a tendency and personally I consider CNN to be lean left. The nature of the community will attract certain user group and might result differently in user behaviors. Also the specific topics can lead to different results. But in the analysis I only see ‘measuring undesired behavior’. This is rather blurry description, but the term antisocial itself is hard to have a clear definition. This makes me curious because different communities have different banning rules, but carrying out these rules can vary accordingly. In this article it is simply categorized as banned and non-banned. Also the banning rule is different across different communities, but some data is treated as a single entity. To me this may not be completely solved due to the nature of the question, but can definitely be further elaborated or discussed. Also, the number of data samples are not consistent (18758 for CNN, 1164 for IGN, 1138 for Breitbart)

As for the proposed features and classifier to predict antisocial behaviors, I like the idea. While using bag of words can measure literal ‘trollings and abuse’. However a lot of antisocial behaviors online are one step further, which is not limited to literal words e.g. sarcasm. When sarcasm goes extreme, it can be antisocial. Identifying those specific antisocial behaviors can be easy within a interest group, but when there is an agreement in such groups, it is likely that this post get reported / deleted. Subjective Deleted / reported posts should not be the only metric for measuring antisocial behaviors. Objective features, such as using down votes, might reduce the influence of such subjective behavior from administrator. But needs further clarification. When you downvote in some communities, it provide options for you to choose the reason for such votes: disagree, irrelevant, or trolling. This will help the classifier get clarified response for down vote reasons.

Questions:

This paper study banned posts from 3 large communities, but different communities has different guidelines and what kind of guideline can be generalizable for all communities?

Is Antisocial Behavior / Language as main user banning criteria consistent for all cases discussed here? How can it be verified / pre-processed?

For CNN, I have the impression that user tend to view this website based on their political background. We also see a higher # posts reported% compared to websites like IGN which users are less ‘categorized’. Will the nature of the website have an influence on how users behave? (I mean for sure but this might be something noteworthy)

Read More

Reflection #3 – [1/25] – Meghendra Singh

Cheng, J., Danescu-Niculescu-Mizil, C., & Leskovec, J. (2015, April). Antisocial Behavior in Online Discussion Communities. In ICWSM (pp. 61-70).

The paper presents an interesting analysis of users on news communities. The objective here is to identify users who engage in antisocial behavior like – trolling, flaming, bullying and harassment on such communities. Through this paper, the authors reveal compelling insights into the behavior of users who were banned. These insights are: banned users post irrelevantly, garner more replies, focus on a small number of discussion threads and post heavily on these threads. Additionally, posts by such users are less readable, lack positive emotion and more than half of these posts are deleted. Further, the reduction in text quality of their posts and the probability of the posts being deleted increase over time. Furthermore, the authors suggest that certain user features can be used to detect users that will be potentially banned. To this end a few techniques to identify “bannable” users are discussed towards the end of the paper.

First, I would like to quote from the Wikipedia article about Breitbart News:

Breitbart News Network (known commonly as Breitbart News, Breitbart or Breitbart.com) is a far-right American news, opinion and commentary website founded in 2007 by conservative commentator Andrew Breitbart. The site has published a number of falsehoods and conspiracy theories, as well as intentionally misleading stories. Its journalists are ideologically driven, and some of its content has been called misogynist, xenophobic and racist.

My thought after looking through Breitbart.com was, isn’t this community itself somewhat antisocial? One can easily imagine a lot of liberals getting banned in this forum for contending the posted articles? And this is what the homepage of Breitbart.com looked like in the morning:

While the paper itself presents a stimulating discussion about antisocial behavior in online discussion forums, I feel that there is a presumption that a user’s antisocial behavior always results in them being banned. The authors discuss that communities are initially tolerant to antisocial posts and users, and this bias can easily be used to evade getting banned. For example, a troll may initially post antisocial content, switch to the usual positive discussions for a substantial period of time and return to posting antisocial content. Also, what’s to stop a banned user from creating a new account and return to the community, I mean all you need is a new e-mail account for Disqus? This is important because most of these news communities don’t require the notion of reputation for posting comments on their articles. On the other hand, I feel that the “gamified” reputation system on communities like Stack Exchange would act as a deterrent against antisocial behavior. Hence, it would be interesting to find who gets banned in such “better designed” communities and are the markers of antisocial behavior similar to those of news communities? An interesting post here.

Another question to ask is are there deeper tie-ins of antisocial behavior on online discussion forums? Are these behaviors predictors of some pathological condition with the human posting the content? The authors briefly mention these issues in the related work. Also, it would be interesting to discover, if a troll on one community, is also a troll on another community? The authors mention that this research can lead to new methods for identifying undesirable users in online communities. I feel that detecting undesirable users beforehand is a bit like finding criminals before they have committed the crime, and there may be some ethical issues involved here. A better approach might be to looks for linguistic markers that suggest antisocial themes in the content of a post and warn the user of the consequences of submitting it, instead of recommending users to be banned to the moderator, after the damage has already been done. This also leads to the question that what are the events/news/articles that generally lead to antisocial behavior? Are there certain contentious topics that lead regular users to bully and troll others? Another question to ask here is: Can we detect debates in comments to a post? This might be a relevant feature that can predict antisocial behavior. Additionally, establishing a causal link between the pattern of replies in a thread and the content of the replies may help to identify “potential” antisocial posts. A naïve approach to handle this might be to simply restrict the maximum number of comments a user can submit to a thread? Another interesting question maybe to find out, if FBUs start contentious debates, i.e. do they generally start a thread or do they prefer replying to existing threads? The authors provide some indication towards this question, in the section “How do FBUs generate activity around themselves?”.

Lastly, I feel that a classifier precision of 0.8 is not good enough for detecting FBUs. I say this because the objective here is to recommend for banning potential antisocial users to human moderators, so as to keep their manual labor and having a lot of false-positives will defeat this purpose in some sense. Also, I don’t quite agree with the claim that the classifiers are cross-domain. I feel that there will be a huge overlap between CNN and Breitbart.com in the area of political news. Also, the dataset is derived from primarily news websites where people discuss and comment on a articles written by journalists and editors. These might not apply to Q&A websites (For E.g. Quora, StackOverflow) or places where users can submit articles (For E.g. Medium) or more technically inclined communities (For E.g. TechCrunch).

Read More

Reflection #3 – [1/25] – [Md Momen Bhuiyan]

Paper: Antisocial Behavior in Online Discussion Communities

Summary:
Although antisocial behavior in online communities is very common, most of the recent research on this subject has focused on qualitative analysis using a small group of users. This paper uses data from three online communities (CNN.com, breitbart.com, and IGN.com) for quantitative analysis to get a general understanding of the antisocial behavior of users. For the sake of comparison, the paper discusses two types of users: Future-Banned-Users (FBU) and Never-Banned-Users (NBU). By analyzing the post throughout their activity span on the forum authors find that posts by the FBUs are very different than other posts in the thread and harder to read. They also find that their quality of posts worsens over time. Censorship also plays a role in the guiding the writing style of the FBUs. FBUs post a lot and get a higher number of responses. Another finding of the study is that two types of FBUs exist in an online community: one with higher post deletion rate and other with lower. Finally, the authors use four types of features to create a classifier for predicting antisocial behavior: post feature, activity feature, community feature and moderator feature.

Reflection:
The paper explains the process of analyzing antisocial behavior starting from data preparation to analysis in great detail. One interesting aspect of the process was using crowdsourcing for initial classification. The authors’ analysis of the final classifier provided some interesting insight. For example, classifiers performance peaks on seeing the attributes from first 10 posts. This correlates with the idea that other community members judge FBUs in a similar fashion. The performance of the classifier on Hi-FBUs suggests that the classifier learns the post deletion ratio as one of the primary indicators which explains why prediction performance peaks at seeing first 10 posts. The authors’ analysis of the cross-platform performance of the algorithm was very intuitive. Although the prediction quality of the classifier is good enough, there remains the issue of application of such tools. Finally, this paper discusses a sensitive issue of antisocial behavior and creates a tool for prediction. Although the performance is good enough, still there is a necessity of the human factor in preventing such behavior.

Read More

Reflection #3 – [01/25] – [Patrick Sullivan]

Title: Antisocial Behavior in Online Discussion Communities.  Cheng et al. are exploring the dynamics of large online communities with members behaving undesirably.

The authors only looked into three news websites to learn about antisocial behavior. While each site has their own target audience to set it apart, there may be little other variation among them. Antisocial behavior is very present in other media forms, so the publication’s conclusions could be more generalized if they covered online communities on Facebook, Reddit, and Youtube. Perhaps this would show that there’s several more variants of antisocial behavior and its evolution in an online community that depends on the constraints and features of a platform. It also could help protect the results if one community was found to be dysfunctional or especially unique. I do not fault the authors for this oversight for two reasons: It is more difficult to structure and follow up a longitudinal study for several more changing social media platforms; and it is quite difficult to cross-compare all of the platforms I listed whereas Cheng’s platforms each have the same overall structure. I do wish that there was more discussion related to this idea then the small mention in the paper’s conclusion.

Cheng et al. also look into the changes that happen over time to an online community and the users that are eventually banned. While this can be used to predict growing antisocial behavior from within a community, it could be impacted by the overall climate of that platform. I imagine that these platforms undergo rapid changes to both moderation and antisocial activity when an event occurs. A sudden rise in concern over ethics and harassment surrounding video game journalism surely would have a profound effect on a social gaming news website’s community. Likewise, approval or disapproval of news network’s articles from a high-profile political leader would likely harness more attention, moderation action, or antisocial behavior. A longitudinal study might be measuring how an event changed a community (either temporarily or permanently), and not how the members of a community typically moderate and react to moderation. A simple investigation I did using Google trends shows a possibly significant spike in ‘gaming journalism’ interest in the latter half of 2014, and could very well have been captured in the 18-month window this longitudinal study took place. In addition, Google’s related topics during that time show terms like ‘ethics‘, ‘corruption‘, and ‘controversy‘ (see image source). These terms have special meaning and connection to the idea of online community moderation, and should not be taken lightly. The omission of even mentioning this event by the authors makes me question if they were so focused on the antisocial behavior that they did not monitor the community for events that could devalue their data.

Google Trends: 'Gaming Journalism' in 2014

Source: https://trends.google.com/trends/explore?date=2014-01-01%202014-12-31&q=gaming%20journalism

Read More

Reflection #3 – [1/23] – [Jiameng Pu]

Cheng, Justin, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. “Antisocial Behavior in Online Discussion Communities.” ICWSM. 2015.

Summary:

Users contributions are an important part of kinds of social platforms, e.g., posts, comments, votes, etc. While most users are civil, few of the antisocial users greatly contaminate the environment of the internet. By mainly studying on users who were banned from specific communities and compare two user groups, FBUs(Future-Banned Users) and NBUs(Never-Banned Users), the authors try to characterize antisocial behavior, e.g., how FBUs write, how FBUs generate activity around themselves.  The “Evolution over time” analysis shows that FBUs write worse than other users over time and tend to exacerbate their antisocial behavior when there is more strong criticism in the community. By designing features based on the observations and then categorize them, the work can potentially help alleviate the burden of social community moderators from heavy manual labor. Besides, it proposes a typology of antisocial users based on post deletion rates. Finally, A system is introduced to identify undesired users early on in their community life.

Reflection:

The paper leads some extensive discussion and analysis on the topic of antisocial behavior, I highlight some points impress/inspire me most. First, the analysis about how to measure undesired behavior is a useful one in the data preparation section.  It reminds me that down-voting activities cannot be interpreted as undesirable in the context of “antisocial behavior”, which is a much narrower conception. Personally, I don’t use down-vote functionality that much when I browse Q&A websites like Quora, Zhihu, and StackOverflow. And it turns out many people also keep the same habit, which is a good instance where considering fewer features/data, i.e., report records and post delete rates, makes more sense.  Second, instead of predicting whether a particular post or comment is malicious, they put more focus on individual users and their whole community life, which is harder to analyze but bring more convenience to community moderators, since they can do their job like a real community police but not simply a cleaner.  Third, four categories of feature properly cover all the feature classes, but the author doesn’t mention some of potentially important features in Table 3, e.g., post comments, which could be categorized into post features; user’s followings and followers, which could be categorized into community features. Intuitively, these two features are strong indicators of the user’s properties — people of one mind fall into the same group and harsh criticism would show up in the comment area of malicious posts.

I notice that the author performs the above task on a balanced dataset of FBUs and NBUs (N=18758 for CNN, 1164 for IGN, 1138 for Breitbart), suggesting that these learned models generalize to multiple communities. Though the number of FBUs and NBUs is balanced, would the different number of user samples from three platforms influence the generalization of the resulting classifier? To my point of view, it’s more rigorous for the author to modify lopsided data samples or add more discussion about how data can be properly sampled.

Questions & thoughts:

  1. What’s the proper line between the definition of antisocial and non-antisocial? We should avoid confusing unpleasant users and antisocial users.
  2. Compared to the last paper, there is less description of implementation tools throughout different phases of research. I’m pretty curious about how to do specific procedures practically, e.g., data collecting, feature categorization, investigation of the evolution of user behavior and of community response.
  3. I think the classifiers we choose probably make a difference in the prediction accuracy, so it might be better to compare the performance of those classifiers to find out more feasible classifier for this task.
  4. Although we can roughly see the contribution of each feature category from Table 4, I think more extensive and quantitive analysis would complete the research.

Read More