Reflection #4 – [01/30] – [Vartan Kesiz-Abnousi]

Zhe Zhao, Paul Resnick, and Qiaozhu Mei. 2015. Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts. In Proceedings of the 24th International Conference on World Wide Web (WWW ’15). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1395-1405. DOI: https://doi.org/10.1145/2736277.2741637

 

Summary

The authors aim to identify trending topics in social media, even topics that are not pre-defined. They use Twitter as their source of data. Specifically, they analyze 10,417 tweets related to five rumors. They present a technique to identify trending rumors. Moreover, they define them into topics that include disputed factual claims. Subsequently, they deem the identification of trending rumors as soon as possible important. While it is not easy to identify the factual claims on individual posts, the authors redefine the problem in order to deal with this. Therefore, they find cluster of posts whose topic is a disputed factual claim. Furthermore, when there is a rumor there are usually posts that raise questions by using some signature text phrases. The authors search for these keywords in order to identify them such as: “Is this true?”. As the authors find, many rumor diffusion processes have some posts that have such enquiry phrases quite early in the diffusion. Subsequently, the authors develop a rumor detection method that looks for the enquire phrases. It follows five steps: identification of signal tweets and signal clusters, detect statements, capture non-signal tweets, and rank candidate rumor clusters. Therefore, the method clusters similar posts together and finally collects the related posts that do not contain the enquire phrases. Next, they rank the clusters of posts by their likelihood of really containing a disputed factual claim. The detectors find that the method has a very good performance. About a third of the top 50 clusters were judged to be rumors, a high enough precision

 

Reflections

 

The broad success of online social media has created fertile soil for the emergence and fast spread of rumors.  A notable example is that one week after the Boston bombing, the official Twitter account of the Associated Press (AP) was hacked. The hacked account sent out a tweet about two explosions in the White House and the President being injured. Subsequently, the authors have an ambitious goal. They propose that instead of relying solely on human observers to identify trending rumors, it would be helpful to have an automated tool to identify potential rumors. I find the idea of identify rumors in real time, instead of retrieving all the tweets related to them, very novel and intelligent. To their credit, the authors acknowledge that identifying the truth value of an arbitrary statement is very difficult, probably as difficult as any natural language processing problems. They stress that their goal does not make any attempt to assess whether rumors are true or not, or classify or rank them based on the probability that they are true. They rank the clusters based on the probability that they contain a disputed claim, not that they contain a false claim.

 

I am particularly concerned regarding the adverse effect of automated rumor detection. In particular, its use in either damage control or disinformation campaigns. The authors write: “People who are exposed to a rumor, before deciding whether to believe it or not, will take a step of information enquiry to seek more information or to express skepticism without asserting specifically that it is false”. However, this statement is not self-evident. For instance, what if the flagging mechanism of a rumor, “disputed claim”, does not work for all cases? Government official statements would probably not be flagged as “rumors”. A classic example is the existence, or lack thereof, of WMD’s in Iraq. Most of the media corroborated with the government’s (dis)information. To put things into more technical terms, what if the twitter posts do not have any of the enquiry phrases (i.e. “Is this true?”)? The clusters would then not detect them as “signal tweets”. In that case, the automated algorithm would never find a “rumor” to begin with. The algorithm would do what it was programmed to do, but it would have failed to detect rumors.

 

Perhaps the greatest controversy is surrounded by how “rumor” is defined. According to the authors, “A rumor is a controversial and fact-checkable statement”. By “Fact-checkable”: In principle, the statement has a truth value that could be determined right now by an observer who had access to all relevant evidence. By “Controversial (or Disputed)”: At some point in the life cycle of the statement, some people express skepticism. I think the “controversial” part might be the weakest part of the definition. Would the statement “earth is round” be controversial because at “some point in the life cycle of the statement, some people express skepticism”? The authors try to recognize such tweets into a category they label as “signal tweets”.

Regardless, I particularly liked the rigorous definitions provided in the “Computational Problem” section that leaves no room for misinterpretation. There is room for research in the automated rumor detection area. Especially if it could broaden the “definition” of rumor and somehow embed it in the detection method.

Questions

  1. What if the human annotators are biased in manually labeling rumors?
  2. What is the logic regarding the length of the time interval? Is it ad hoc? How sensitive are the results to the choice of time interval?
  3. Why was Jaccard similarity coefficient set to a 0.6 threshold? Is this the standard in this type of research?

 

 

Mitra, Tanushree, Graham P. Wright, and Eric Gilbert. “A parsimonious language model of social media credibility across disparate events.” Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. ACM, 2017.

 

Summary

The main goal of this article is to examine whether the language captured in unfolding Twitter events provide information about the event’s credibility. The data is a corpus of public Twitter messages with 66M messages corresponding to 1,377 real-world events over a span of three months, October 2014 to February 2015. The athors identify 15 theoretically grounded linguistic dimensions and present a parsimonious model that maps language cues to perceived levels of credibility. The results demonstrate that certain linguistic categories and their associated phrases are strong predictors surrounding disparate social media events. The language used by millions of people on Twitter has considerable information about an event’s credibility.

Reflections

With the ever increasing doubt on the credibility of information found on social media, it is important for both citizens and platforms to identify such non-credible information. My intuition before even completing the paper was that the type language used in Twitter posts could be used an indicator to capture the credibility of an event. Furthermore, even though not all non-credible events can be captured just by language, we could still be able to capture a subset. Interestingly enough, the authors indeed verify this hypothesis. This is important in the sense that we can capture non-credible posts with a parsimonious model through a first “screening model”. Then, after discarding these posts we could proceed to more complex models to add additional “filters” that detect non-credible posts. One of the dangers that I find is to make sure not to eliminate credible posts, a false positive error, with “positive” being non-credible error. The second important contribution is that instead of retrospectively identifying whether the information is credible or not, they use CREDBANK in order to overcome dependent variable bias. The choice of Pca Likert scale renders the results interpretable. In order to make sure that the results make sense, they compare this index with hierarchical agglomerative clustering. After comparing the two methods, they find high agreement between our Pca based and HAC-based clustering approaches.

Questions

As the authors discuss, there is no broad consensus of the meaning of “credibility”. In this case credibility is the accuracy of the information.  In turn the accuracy of information is examined by instructed raters. The authors use an objective definition of credibility that is dependent on the instructed raters. Are there other ways to assess “credibility” based on “information quality”? Would that yield different results?

 

Garrett, R. Kelly, and Brian E. Weeks. “The promise and peril of real-time corrections to political misperceptions.”

Summary

This paper presents an experiment comparing the effects of real-time corrections to corrections that are presented after a short distractor task. Closer inspection reveals that this is only true among individuals predisposed to reject the false claim. The authors find that individuals whose attitudes are supported by the inaccurate information distrust the source more when corrections are presented in real time, yielding beliefs comparable to those never exposed to a correction.

Reflections

I find it interesting Providing factual information is a necessary, but not sufficient, condition for facilitating learning, especially around contentious issues and disputed facts. Furthermore, the authors claim that individual are affected by a variety of biases and that can lead them to reject carefully documented evidence, and correcting misinformation at its source can actually augment the effects of these biases. In Behavioral Economic there is a term that describes this biases. It is called “Bounded Rationality”. Furthermore, economic models used to assume that humans make rational choices. This “rationality” was formalized mathematically and then Economists create optimization problems that takes into account human behavior. However, new Economic models take into account the concept of bounded rationality into their Economic models through various ways. Perhaps it could be useful for the authors to draw some information from this literature.

Question?

1. Would embedding the concept of “Bounded Rationality” provide a theoretical framework for a possible extension of this study?

Read More

Reflection #4 – [01-30] – [Patrick Sullivan]

“The Promise and Peril of Real-Time Corrections to Political Misperceptions”
Garrett and Weeks are finding ways to respond to inaccurate political claims online.

“A Parsimonious Language Model of Social Media Credibility Across Disparate Events”
Mitra, Wright, and Gilbert are using language analysis to predict the credibility of Twitter posts.

Garrett and Weeks rightfully point out that a longer-term study is a priority goal of future studies. People are naturally inclined to defend their worldview, so they resist changing their opinions within a short amount of time. But the effect of the repeated corrections over a period of time might have more influence on a person. Participants might need more time to be able to build trust in the corrections before accepting them. The added insight from the corrections might also lead them to consider that there are more nuance to many of their other views, and that they are worth looking into. There are many Psychological elements to consider here in terms of persuasion, trust, participant’s backgrounds, and social media.

I think the truth might be more aligned with Garrett and Week’s hypotheses than the results show. Self-reporting from participants on changes to their opinion likely keeps some participants from reporting an actual change. The study notes how participants are defensive of their original position before the experiment and resist change. If a correction does change a participant’s view, then they could be quite embarrassed for being manipulated with misinformation and not being as open-minded or unbiased as they believed. This is a version of a well-known psychological reaction called cognitive dissonance. People usually resolve cognitive dissonance over time, tuning their opinions slowly until they are supported by the subject’s experiences. Again, this can be investigated in a longer-term study of the corrections.

Mitra, Wright, and Gilbert all consider credibility has a direct connection to language and vocabulary. I don’t know if they can correctly account for context and complexities such as sarcasm. The CREDBANK corpus may be quite useful for training using labeled social media concerning events, but real world data could still have these complications to overcome. Perhaps there are ways of measuring intent or underlying message of social media posts in other studies. Otherwise, using humor or sarcasm in social media could produce error since they are not measured as such in the variables of language.

With both of these papers, we know we can identify dubious claims made online and how to present corrections to users in a non-harmful way. But I believe that computers are likely not adept at crafting the corrections themselves. This would be an opportune time for human-computer collaboration, where the computer gathers claims to an expert user, who checks the claim and crafts a correction, which is then given to the computer to distribute widely to others who make the same claim. This type of system both adapts to new misinformation being reported and can be tuned to fit each expert’s area uniquely.

Read More

Reflection #4 – [1/30] – Hamza Manzoor

[1]. Garrett, R. Kelly, and Brian E. Weeks. “The promise and peril of real-time corrections to political misperceptions.”

[2]. Mitra, Tanushree, Graham P. Wright, and Eric Gilbert. “A parsimonious language model of social media credibility across disparate events.”

 

These papers are very relevant in this digital age where everyone has a voice and as a result there is plethora of misinformation around the web. In [1], the authors compare the effects of real-time corrections to corrections that are presented after a distraction. To study the implications of correcting the incorrect information, they conducted a between-participants experiment on electronic health records (EHRs) to examine how effective is real-time corrections to corrections that are presented later. Their experiment consisted of demographically diverse sample of 574 participants. In [2], Mitra et. al. present a study to assess the credibility of social media events. In this paper, they present a model that captures language used in Twitter messages of 1,377 real-world events (66M messages) using CREDBANK corpus. The CREDBANK corpus used Mechanical Turks to obtain credibility annotations, after that the authors trained penalized logistic regression using 15 linguistic and other control features present to predict the credibility level (Low, Medium or High) of the event streams.

Garrett et. al. claim that real-time correction even though is more effective than delayed correction but it can have implications especially with people who are predisposed to a certain ideology. First of all, the sample that they had was US-based, which makes me question that will these results hold in other societies? Is sample diverse enough to generalize it? Can we even generalize it for US only? The sample has 86% white people whereas, US has over 14% non-resident immigrants only.

The experiment also does not explain what factors contribute towards people sticking to their preconceived notions? Is it education or age? Are educated people more open to corrections? Are older people less likely to change their opinions?

Also, one experiment on EHRs is inconclusive. Can one topic generalize the results? Can we repeat these experiments with more controversial topics using Mechanical Turks?  

Finally, throughout the paper I felt that delayed correction was not thoroughly discussed. I felt that paper focused so much on psychological aspects of preconceived notions that they neglected (or forgot) to discuss delayed correction. How much delay is suitable? How and when should delayed correction be shown? What if reader closes the article right after reading it? These are the few key questions that should have been answered regarding delayed corrections.

In second paper, Mitra et. al. presented a study to assess the credibility of social media events. They use penalized logistic regression, which in my opinion was a correct choice because linguistic features would add multi co-linearity and penalizing features seems to be the correct approach. But since they use CREDBANK corpus, which used Mechanical Turks, it raises the same questions we discuss in every lecture that did Turkers thoroughly went through every tweet? Can we neglect Turkers bias? Secondly, can we generalize that Pca based credibility classification technique will always better than data-driven classification approaches?

The creation of features though raises few questions. The authors make a lot of assumption in linguistic features for example, they hypothesize that coherent narrative can be associated with higher level of credibility which even though does make sense but can we hypothesize something and not prove it later? Which makes me questions on feature space that were they right features? Finally, can we extend this study to other social media? Will a corpus generated through twitter events work for other social medias?

 

Read More

Reflection #4 – [1/30] – Aparna Gupta

Reflection 4:

  1. Garrett, R. Kelly, and Brian E. Weeks. “The promise and peril of real-time corrections to political misperceptions.” Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 2013.
  2. Mitra, Tanushree, Graham P. Wright, and Eric Gilbert. “A parsimonious language model of social media credibility across disparate events.” Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. ACM, 2017.

Summary:

Both papers talk about social media Credibility of the content posted on social media websites like Twitter. Mitra et. al., have presented a parsimonious model that maps language cues to the perceived level of credibility and their results show that certain linguistic categories and their associated phrases are strong predictors surrounding disparate social media events. Their dataset contains 1,377 real-world events. Whereas, Garrett et.al., have presented a study that focuses on comparing the effects of real-time corrections to corrections that are presented after a short distractor task.

Reflection:

Both the papers present interesting findings of information credibility across world wide web can be interpreted.

In the first paper Garrett et. al., have shown how political facts and information can be misstated. According to them, real-time corrections are better than making corrections after a delay. I feel that this is true to a certain level since a user hardly revisits an already read post. If the corrections are made real-time it helps to understand that a mistake has been corrected and hence credible information has now been posted. However, I feel that the experiment about what users perceive – 1. When provided with an inaccurate statement and no correction, 2. When provided a correction after a delay and 3. When provided with messages in which disputed information is highlighted and accompanied by correction. – can be biased based on user’s interest in the content.

An interesting part of this paper was the listing of various tools (Truthy, Videolyzer, etc.,) which can be used to either identify and highlight inaccurate phases.

The second paper Mitra et. al., have tried to map language cues with perceived levels of credibility. They have targeted a problem which is now quite prevalent. Since world wide web is open to everyone, people have the freedom to post any content without caring about the credibility of the information being posted. For example, there are times when I have come across same information (with exact same words) being posted by multiple users. This makes me wonder about the authenticity of the content and raises a doubt about the content credibility. I really liked the approach adopted by the authors to identify expressions which leads to the low or high credibility of the content. However, the authors have focussed on the perceived credibility in this paper. Can “perceived” credibility be considered same as the “actual” credibility of the information? How can the bias be eliminated, if there is any? I feel these are more psychology and theory-based questions and extremely difficult to quantify.

In conclusion, I found both papers very intriguing. I felt that these papers present a perfect amalgamation of human psychology and problems at hand and how they can be addressed using statistical models.

 

 

 

 

Read More

Reflection #4 – [1/30] – Pratik Anand

Paper 1 : The Promise and Peril of Real-time Corrections to Political Misconceptions
Paper 2 : A Parsimonious Language Model of Social Media Credibility Across Disparate Events

Both papers deal with user opinions on information accessed on the Internet, especially news articles. The first paper shows that the individuals exposed to real-time correction to the news articles they are reading, are less likely to be influenced by the corrections if it goes against their belief system. On the other hand, it is effective if the correction goes with their beliefs. The paper goes into details into explaining that more emotionally invested a user in the topic of an article, less likely he/she is going to be swayed by the correction whether it is real-time or delayed. Users get more defensive in such cases especially if the counterargument is provided in real-time. The acceptance rate of counterargument is a little better if they are presented after a delay to the user. I can understand that a delay can provide the reader with time to introspect his/her beliefs on the topics and may raise curiosity for counterarguments. However, will all kind of delays have that effect ? Human attention span is pretty low and it may happen when the correction is introduced to the user, the user may not remember the original article but only its summarized judgments and the correction may not have a strong effect on it. I liked the part of the paper where they discussed how users only keep summary judgment of attitude and beliefs and discard the evidences. When presented with counterarguments, they decide whether the new idea is persuasive enough to update old beliefs, paying little attention to the facts provided by the article or its source. I think that the readers will trust a counterargument to deeply held beliefs more if it comes from sources which they already trust. Similar observation has been made in paper which adds that the framing of the counterargument is also very important. A counterargument which is respectable of the reader’s beliefs will have a greater impact.

The second paper introduces the factor of the language model used in credibility of a news articles. Certain words reduce credibility and other increase credibility. Such research is quite important, however, it doesn’t take into account external factors like the personal beliefs and attitude of the reader, his/her political and social context etc. Even though paper makes a case of certain words of assertiveness which cause increased credibility, it cannot be generalized. For some people, assertive opinions which are go against their beliefs might appear less credible than opinions which shows ambiguity and reasoning of both sides of the arguments. A good direction of research from the second paper should be inclusion of diversity factors into credibility account. Different factors like gender, race, economic and social status, age etc. could vary the results.

An example of different result would be a research on similar topic was done on reddit community changemyview. In changemyview, people ask to be provided with challenging views to their deeply held beliefs. The paper concluded that in most of the cases, language and the tone of the counterargument played a bigger role than quantity or quality of the facts. A counterargument which causes the reader to introspect and arrive on the same conclusion as the counterargument will be highly influential in changing reader’s view. It also makes a point that only those people who are ready to listen to counterarguments participate in that community. Hence, the people who decide to stay in their echo chamber are not evaluated in the study.

Read More

Reflection #4 – [1/30] – Jamal A. Khan

In this paper the questions posed by the authors are quite thought provoking and are ones which, i believe, most of us in computer science would simply overlook, mostly perhaps due to lack of knowledge in psychology. The main question is what sort of corrective approach would lead people to let go of political misconception? is it an immediate correction? or is it a correction provided after some delay? or perhaps no corrections are the way to go? Given the fact that this is a computational social science course, i found this paper, though interesting, a bit out of place. The methodologies, questions and the way results and discussion are built up, make it quite inline to be psychology paper.

Nevertheless, coming back to the paper, I felt that two aspects completely ignored during analysis were gender and age:

  • How do men and women react to same stimuli of correction?
  • Are younger people more open to differing opinions and/or are young people more objective at judging the evidence countering their misperceptions?

It would’ve been great to see how the results extrapolate among different racial groups. Though, i guess that’s a bit unreasonable to expect given that ~87% of the sample population comprises of one racial group. This highlighted snippet from the paper made a chuckle:

There’s nothing wrong  in making claims but one should have the numbers to back it up, which doesn’t seem to be the case here.

The second question that arose in my mind comes from personal experience is whether the results of this study would hold true in, say, Pakistan or India? The reason I ask this question is that politics there is driven by different factors such as religion and what not, so the behavior of people and their tendency to stick to a certain views regardless of evidence flouting them would be different.

The third point, would be the the relationship of the aforementioned concerns and level of education of the subject:

  • Do more educated have the ability to push aside their preconceived notions and point of views when presented with corrective evidence?
  • How is level of education co-related with the notion of whether a media/new outlet is biased or not and the ability to ignore that notion?

Before moving onto the results and discussions, i have a few concerns about how some of the data was collected from the participants. In particular:

  • They have a 1-7 scale for some questions, 1 being the worst and 7 being the best case. How do people even put themselves on such a scale that has no reference point? Given that was no reference point, or at least none that authors mentioned, any answer given by the participants to such questions will be ambiguous and highly biased relative to what they consider to be a 1 or 7 on the scale. Hence results drawn from such questions would be misleading at best.
  • The second concern has more to do with the the timing assigned to the reading? why 1 minute or even 2 at that? why was it not considered mandatory for the participants to read the entire document/piece of text? what motivated this method and what merits does it have if any.
  • MTurk was launched publicly on November 2, 2005 and the paper published in 2013. Was it not possible to gather more data using remote participants.

Now the results section managed to pull all sorts of triggers for me so i’m not going to get into details of them and just pose three questions:

  • Graphs with unlabelled y-axis? though i don’t doubt the authenticity or intentions of the authors but this makes the results so much less credible for me.
  • Supposing the y-axis are in 0-1 range, why are all the threshold at 0.5?
  • Why linear regression? won’t that force all results to be artifacts of the fit rather than actual trends? Logistic regression or Regression Trees i believe would have been a better choice without sacrificing interpretability.

Now the results drawn are quite interesting e.g. One of thefindings that I didn’t expect was that real-time correction don’t actually provoke heightened counter arguments but that the problem comes into play via biases stemming from prior attitudes when comparing credibility of the claims. So the question then arises, how do we correct people when they’re wrong about ideas they feel strongly about and when the strength of their belief might dominate the ability to reason? In this regard, I like the first recommendation made by the authors of presenting news from sources that the users trust, which these days can be easily extracted from their browsing history or even input form the users themselves. Given the fact that extraction of such information form users is already common place i.e. google and Facebook use to place ads, I  we needn’t worry about privacy. What we do need to worry about is as the authors mentioned it’s tendency to backfire and reinforce the mispercetptions. So, then the question transforms to how do we not make this customization tool a two-edged-sword? one idea is that maybe we could provide users a scale of how left or right leaning the source is when presenting the information and tailor the list of sources to include more neutral ones or tailor the raking to make sources seem neutral and occasionally sprinkle in the opposite leaning sources as well to spice things up.

I would like to close off by saying that, we as computer scientist do have the ability, far above many, to build tools that shape society and it falls upon to us to understand the populace and human behavior and the peoples’ psychology much more deeply than already we do. If we don’t then we run into the danger of the tools generating results contrary to what they had been designed to.  As Uncle Ben put it With great power comes great responsibility“.

Read More

Reflection #4 – [01/30] – [Jiameng Pu]

  • A parsimonious language model of social media credibility across disparate events

Summary:

With social media’s dominance of people’s acquisition of news and events, the credibility of content on different platforms tend to be less rigorous than that of content on traditional journalists. In this paper, the author conducts research on analyzing the credibility of content on social media and proposes a parsimonious model that can map language cues to perceived levels of credibility. The model is presented based on examining the credibility corpus of Twitter messages corresponding to 15 theoretically grounded linguistic dimensions. It turns out that there are considerable indicators of events’ credibility in the language people use.

Reflection:

The paper inspires me a lot by leading me to go through a whole research process from idea creation to idea implementation. In the data preparation phase, it’s a very common task to annotate with ordinal values on the content we are studying, the proportion-based ordinal scale PCA was used in this paper can help compromise extreme conditions, which is like a trick that I can take away and try in other studies. The author uses logistic regression as the classifier, I think neural networks should also be a good choice to make classifications. Specifically, neural networks used for classifying usually has inputs of feature number and outputs of class number. Potentially, neural networks might help us derive better-classifying performance, which then helps the analysis of feature contribution.

  • The Promise and Peril of Real-Time Corrections to Political Misperceptions

Summary:

Computer scientists create real-time correction systems to label inaccurate political information with the purpose of warning users of inaccurate information. However, the author is skeptical about the efficiency of the real-time correction strategy and leads an experiment to compare the effects of real-time correction to non-real-time correction. In the design phase of the comparative experiment, the researchers conduct a control variable method and assess participants’ perceptions of facts through questionnaires. Then they construct a linear regression model to analyze the relationship between belief accuracy of participates and the correction strategy. The paper concludes that real-time corrections are modestly more effective only among individuals predisposed to reject the false claim.

Reflections:

This paper conducts a comparative study about effects of immediate and delayed corrections on readers’ belief accuracy. Generally, one of the most important parts of comparative research is to design a reasonable and feasible experimental scheme. Although lots of big data research needs to collect ready-made data for preprocessing, some research requires researchers themselves to “produce” data. Thus the design scheme of getting data has a significant impact on subsequent experiments and analysis.
In the first survey-based step for data collection, choice of survey samples, setting of control variables, and evaluation methods are main points that can greatly affect the experimental results, such as the diversity of participants (race, gender, age), the design of delayed correction, and the design of the questionnaire.

Particularly, in order to achieve the delay correction, the author employed a distraction task—participates were asked to complete a three-minute image-comparison task. Although this task can achieve the purpose desired, this is not the only strategy we can perform. For example, the duration of the distraction task may have a different impact on the participants’ cognition of the news facts, so researchers can try multiple durations to observe whether there is different impact. In analyzing section, linear regression is one of the most common models used in result analysis. However, for some complex issues without a strict rule, the error of a linear regression model is potentially larger than that of a nonlinear regression model. Nonlinear regression with appropriate regularization is also an option to choose.

Question:

  1. As analyzed in the limitation section, although the author tries to make the best possible experimental design, there are still many design decisions that affect the experimental results. How can we do this to minimize the error?
  2. Intuitively, it is more proper to study mainstream reading group on the Internet, the average age of the study object is too large under this circumstances.

Read More

Reflection #4 – [1/30] – [Meghendra Singh]

  1. Garrett, R. Kelly, and Brian E. Weeks. “The promise and peril of real-time corrections to political misperceptions.” Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 2013.
  2. Mitra, Tanushree, Graham P. Wright, and Eric Gilbert. “A parsimonious language model of social media credibility across disparate events.” Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. ACM, 2017.

Both the papers focus on issues surrounding credibility of information available on the world wide web and provide directions for future research in the subject matter. Garrett and Weeks focus on the implications of correcting inaccurate information in real-time versus presenting the corrected information after a delay (after a distractor task). Here, the authors used a between-participants experiment to compare participant beliefs on the issue of electronic health records (EHRs), when a news article about EHRs is presented with corrections as opposed to when it is presented with delayed corrections. The study was conducted on 574 demographically diverse U.S. based participants. On the other hand, Mitra et. al., present a model for assessing the credibility of social media events, which was trained using linguistic and control features present in 1377 event streams (66M twitter posts) of the CREDBANK corpus. In this case the authors first use Mechanical Turkers to score the credibility of individual event streams and subsequently train a penalized logistic regression (using LASSO regularization) to predict the ordinal credibility level (Low, Medium or High) of the event streams.

In their paper, Garrett and Weeks explore a subtle yet interesting issue of real-time correction of information leading to individuals rejecting carefully documented evidence and forming a distrust for the source. The paper seems to suggest that people who are predisposed to a certain ideology are more likely to question and not trust any real-time corrections to information on the internet (articles, blogposts, etc.) that go against there ideology. Whereas, people who already have doubts about the information are more likely to agree with the corrections. Upon reading the initial sections of the paper I felt that delayed corrections to information available online might not be really useful. I say this because people rarely revisit an article which they have read in the past. If the corrections are not presented as the readers are going through the information, how will they ultimately receive the corrected information? It is highly unlikely that they will revisit the same article in the future?  I also feel that the study might be prone to sample bias since the attitudes, biases and predispositions of people in the U.S. may not reflect those of another geography. Additionally, as the authors also mention in the limitation of the study, we might get different results if the particular issue that was being analyzed is changed (e.g. we might get different results if the issue was anti-vaccination?).

In the second paper, Mitra et. al. focused on predicting the “perceived” credibility of social media event reportage using linguistic and non-linguistic features. Although the approach is interesting, I feel that there can be a difference between the perceived an actual credibility of an event. For example, given that Mitra et. al., have published that Subjectivity in the original tweets is a good predictor of credibility, malicious twitter users, wanting to spread misinformation, might artificially incorporate language features that improve the Subjectivity of their tweets so that, they seem more credible? A system based on the model presented in the paper would likely assign high perceived credibility to the tweets spreading misinformation in this case? A research question might be, to come up with a model that can detect and compensate for such malicious cases? Another interesting question might be to devise a system that can measure and present users with an events’ “actual” credibility (maybe using crowdsourcing or dependable journalistic channels?) instead of the “perceived” credibility based on language markers in the tweets about the event?

Another, question I have is why the authors use the specific form of Pca (i.e. why were the +1 or “Maybe Accurate” ratings not used for computing Pca?). Also, there are 66M tweets in CREDBANK, given that these are clustered into 1377 event streams, there should be roughly 47K tweets in each event stream (assuming an even distribution). Did each of the 30 Turkers (who were rating an event) read through the 47K tweets or were these divided between the Turkers? Although, I do agree with the authors that this study circumvents the problem of sampling bias as it analyzes a comprehensive collection of a large set of social media events, I feel there is a fair chance of “Turker bias” creeping into the model (in Table 2, we generally see a majority of Turkers rating the events as [+2] i.e. Certainly Accurate? I am curious, was there a group of Turkers who always rated any event stream presented to them as “Certainly Accurate”?)

 

 

Read More

Reflection #4 – [1/29] – Ashish Baghudana

Garrett, R. Kelly, and Brian E. Weeks. “The promise and peril of real-time corrections to political misperceptions.” Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 2013.

Reflection

In this paper, the authors pose two interesting research questions – can political misperceptions and inaccuracies be corrected by a fact-checker, and if so, does real-time correction work better than delayed correction on users? The authors run an experiment where they first read an accurate post about EHRs followed by an inaccurate post from a political blog. The readers were divided into three groups where they were presented with corrections in the inaccurate post:

  • immediately after reading a false post
  • after a distraction activity
  • never

Garrett et al. report that corrections can be effective, even on politically charged topics. Based on the questionnaire at the end of the experiment, the authors concluded that users who were presented with corrections were more accurate on their knowledge on EHRs in general. Specifically, immediate-correction users were more accurate than delayed-correction. However, immediate correction also accentuated the attitudinal bias of these users. People who viewed the issue negatively had an increase in resistance to correction.

This paper is unlike any of the papers we have read in this class till now. In many senses, I feel this paper deals entirely with psychology. While it is applicable to computer scientists in designing fact-checking tools, it has more far-reaching effects. The authors created separate material for each group in their experiment and physically administered the experiment for each of their users. This research paper is a demonstration of meticulous planning and execution.

An immediate question from the paper is – would this experiment be possible using Amazon Mechanical Turk (mTurk)? This would have helped collect more data easily. It would also enable the authors to run multiple experiments with different cases – i.e. more contentious issues than EHRs. The authors mention that second (factually incorrect) article was associated with a popular political blog. If the political blog was right-leaning or left-leaning and this was known to the users, did it affect their ratings in the questionnaire? The authors could have kept an intermediate survey (after stage 1) to understand their prior biases.

A limitation that the authors mention is that of reinforcement of corrections. Unfortunately, running experiments involving humans is a massive exercise, and it would be difficult to repeat this several times. Another issue with these experiments is that users are likely to treat the questionnaire as a memory test and answer based on that, rather than their true beliefs. I also had a contention with the racial diversity in the sample population. The population is majorly white (~86%).

This study can be extended to study the correlation between party affiliation and political views with the willingness of the user for correction. Are certain groups of people more prone to incorrect beliefs?

Read More

Reflection #3 – [1/23] – [Nuo Ma]

Cheng, Justin, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec. “Antisocial Behavior in Online Discussion Communities.” ICWSM. 2015.

Summary:

In this paper, Cheng et al present a study of users banned for antisocial behaviors sampled from  three online communities (CNN, IGN and Breitbart). The author characterize antisocial behaviors by study specific groups from these online communities: FBUs(Future-Banned Users) and NBUs(Never-Banned Users). The author also presented analysis of “evolution over time” indicating FBUs write worse than other users over time and the community tolerance tend to decline over time. At last, the author proposed an approach to extract features for predicting antisocial behaviors, potentially automate and standardize this process.

Reflection:

I think there are several points noteworthy in this paper. First, these are three communities with different characteristics. Breitbart is far-right according to google, IGN don’t have a tendency and personally I consider CNN to be lean left. The nature of the community will attract certain user group and might result differently in user behaviors. Also the specific topics can lead to different results. But in the analysis I only see ‘measuring undesired behavior’. This is rather blurry description, but the term antisocial itself is hard to have a clear definition. This makes me curious because different communities have different banning rules, but carrying out these rules can vary accordingly. In this article it is simply categorized as banned and non-banned. Also the banning rule is different across different communities, but some data is treated as a single entity. To me this may not be completely solved due to the nature of the question, but can definitely be further elaborated or discussed. Also, the number of data samples are not consistent (18758 for CNN, 1164 for IGN, 1138 for Breitbart)

As for the proposed features and classifier to predict antisocial behaviors, I like the idea. While using bag of words can measure literal ‘trollings and abuse’. However a lot of antisocial behaviors online are one step further, which is not limited to literal words e.g. sarcasm. When sarcasm goes extreme, it can be antisocial. Identifying those specific antisocial behaviors can be easy within a interest group, but when there is an agreement in such groups, it is likely that this post get reported / deleted. Subjective Deleted / reported posts should not be the only metric for measuring antisocial behaviors. Objective features, such as using down votes, might reduce the influence of such subjective behavior from administrator. But needs further clarification. When you downvote in some communities, it provide options for you to choose the reason for such votes: disagree, irrelevant, or trolling. This will help the classifier get clarified response for down vote reasons.

Questions:

This paper study banned posts from 3 large communities, but different communities has different guidelines and what kind of guideline can be generalizable for all communities?

Is Antisocial Behavior / Language as main user banning criteria consistent for all cases discussed here? How can it be verified / pre-processed?

For CNN, I have the impression that user tend to view this website based on their political background. We also see a higher # posts reported% compared to websites like IGN which users are less ‘categorized’. Will the nature of the website have an influence on how users behave? (I mean for sure but this might be something noteworthy)

Read More