Digilantism: An analysis of crowdsourcing and the Boston marathon bombings

Paper:

Nhan, J., Huey, L., & Broll, R. (2017). Digilantism: An analysis of crowdsourcing and the Boston marathon bombings. The British Journal of Criminology57(2), 341-361.

Discussion leader: Leanna

Summary:

The article explores digilantism, or crowdsourced web-sleuthing, in the wake of the Boston marathon bombing. They focus on police-citizen collaboration – highlighting various crowdsourcing efforts done by the police and some successes and “failures” of online vigilantism.

The authors theoretically frame their paper around nodal governance – a theoretical marriage between security and social network analysis. In this framing, the authors combine various works. Overall, following the logic of network analysis, the theory understands organizations or security actors as nodes in a decentralized structure. The nodes (or actors) (potentially) have associations and work with each other (edges) within the network, such as police corresponding with private security forces or a Reddit community sharing information to the police. Each node has the potential to have varying degrees (weights) of capital (i.e., economic, political, social, cultural, or symbolic) that can be shared between the nodes.

The authors use threaded discussion as well as thematic analysis to examine various threads from Reddit about the Boston Marathon bombing, coming up with 20 thematic categories. For this paper, the authors are mainly interested in their theme “investigation-related information” (pg. 346). In the results, the authors note that most comments were general in nature. Some sub-themes within the investigation category included 1) public security assets, 2) civilian investigations online, 3) mishandling of clues, and 4) police work online.

The first subcategory—public security assets—discusses the vast professional backgrounds of users on Reddit and their ability to contribute based on this experience and knowledge (e.g., military forensic). In this section, the authors raise the point about the occurrence of parallel investigations and a general lack of communication between the police and web-sleuths (mainly on the part of the police). They speculate this disconnection could stem from the police subculture or legal concerns with incorporating web-sleuths into investigations.

In the next sub-theme—civilian investigations—the authors take note of the unofficial role that Reddit users had in the investigation of the Boston Marathon Bombing. This included identifying photographs of suspects and blast areas, as well as conducting background checks on suspects. Nhan and colleagues referred to this as “virtual crime scene investigation” (pg.350). In this section, the authors expanded upon the silo-effect of parallel investigations. They noted that the relationship between the police and web-sleuths were uni-directional, with users encouraging each other to report to the police with information.

In the third sub-theme—mishandling of clues—the authors focus on two consequences of web-sleuthing: 1) being suspicious of innocent acts; and 2) misidentifying potential suspects. In particular, the authors highlight the fixation of users on people carrying backpacks and the misidentification of Sunil Tripathi as a potential suspect in the bombing.

In the final sub-theme—police work online—the authors highlight police efforts to harness web-sleuths either by providing correct information or by asking people to provide police with videos from the event. The authors noted that this integration of police into the Reddit community was a way to regain control of the situation and the information being spread.

In the conclusion, the authors conclude with various policy recommendations, such as assigning police officers to be moderators on sites such as Reddit or 4Chan. In addition, the authors do acknowledge the geographical and potential cultural differences between their two examples of police crowdsource use (Boston vs. Vancouver). Lastly, the authors again note that the police have not used the expertise of the crowd.

Reflection:

When reading the paper, numerous things came to my mind. Below is a list of some of them:

  1. In the background section, the authors mentioned an article by Ericson and Haggerty (1997) that classifies the four eras of policing: political, reform, community and information. Other authors have defined this fourth era as the national security era (Willard, 2006) or militarized era (Hawdon, 2016). Hawdon (2016) argues in an ASC conference presentation, for example, that a pattern is occurring among the eras (see the first five rows below). In particular, the organizational structure, approach to citizenry, functional purpose and tactical approach of law enforcement flip flops between each era. Thinking forward, I foresee a coming era of crowdsourcing police as a continuation of the pattern Hawdon identifies (see the last row). This style would be decentralized (dispersed among the various actors), clearly integrated into the community, focused on more than law enforcement, and would intervene informally in the community members’ lives (via open-communication online). Therefore, fitting neatly into the cyclical pattern we see in policing (Hawdon, 2016).

 

Era Organizational structure Approach to Citizenry Functional Purpose Tactical Approach
Political (1860-1940) Decentralized Integrated into community Broad Service
Reform (1940-1980) Centralized Distant from community Narrow Legalistic
Community (1980 – 2000) Decentralized Integrated into community Broad Service
Militarized (1990- today) Centralized Distant from community Narrow Legalistic
Crowdsourced (?? – ??) Decentralized Integrated into community Broad Service

Note: functional: “a narrow function (i.e., law enforcement) or serving the community by fulfilling numerous and broad functions” Hawdon (2016) pg. 5; tactical: legalistic = “stresses the law-enforcement function of policing” p.5 service = intervenes frequently in the lives of residents, but officers do so informally” pg. 5.

  1. Nhan and colleagues highlight various “police-citizen collaborations” (pg. 344) with regards to social media, such as crowdsourcing face identification of the 2011 Stanley cup riots and disseminating information via Twitter. But, in many ways, these police engagement in social media appear to lack innovation. The former is like posting wanted photos on telephone poles and the latter disseminating info via a popular newspaper. Yes, the media has changed and therefore the scale of the impact has shifted, but the traditional structure hasn’t changed. The other “police-citizen collaboration” (pg. 344) that was mentioned was collecting information. This is not collaboration. In the example of Facebook, this is simply using the largest repository of available biometric data that people are willing give away for free. It’s becoming the new and improved governmental surveillance dataset, but there is nothing formally collaborative about citizen use of Facebook (even if Facebook might collaborate with law enforcement at times).
  2. The paper is missing crucial details to fully understand the authors’ numerical figures. For example, the authors noted that only a small number of individuals (n=16) appear to be experts. It would have been great to put this figure into context; how many distinct users posted in the amount of posts that were analyzed. Without a larger sense of the total n that the authors are dealing with, assessments of the external validity (generalizability) of the findings becomes difficult.
  3. The authors frame their analysis around nodal governance and the large-scale nodal security network. The guiding theory itself needs to be expanded on. The authors allude to this need but do not make the full connection. In the paper, the police and Reddit are simply being considered as the nodes. Instead, the network needs to acknowledge the organization (e.g., police or Reddit) and also the individual users. This, if my memory serves me correct, is called a multilevel network. In this model, users (nodes in one group) are connected to organizations (nodes in another group) and relationships (or edges) exist between (among actors) and within groups (organizations). The authors allude to this need when mentioning the wide breadth of knowledge and expertise that posters bring when doing web-sleuthing on Reddit, but stop there. Reddit users can be connected to the military (as mentioned) and have access to the capital that that institution brings. These individual users are then connected to two organization structures within the security network.
  4. Lastly, it was not surprising that the authors noted a “mislabelling of innocent actions as suspicious activities” (pg. 353); however, it was surprising that it was underneath the label of “the mishandling of clues” (pg. 353). In addition, the mislabelling of activities is not unique to web sleuths. I was expecting a conversation about mislabeling and its connection to a fearful/risk society. This mislabelling is all around us. Mislabelling happens in schools, for example, when nursery staff think a 4-year-old boy’s drawing of a cucumber is a cooker bomb, when the police think tourists taking photos are terrorists or when police arrest a man thinking his kitty litter was meth.

Questions

  1. Is crowdsourcing the next policing era?
  2. What drives police hesitation for police-citizen collaboration?
  3. Is police reluctance to engage in crowdsourcing harming future innovative methods of crime-fighting or police-community engagement?
  4. What are some ways police can better integrate into the community for investigation?
  5. Does the nodal governance theory fit with the crowdsourcing analysis?

Read More

Galaxy Zoo – A Citizen Science Application

Technology:

Citizen Science Application “Galaxy Zoo”

Demo leader: Lee Lisle

Summary:

Galaxy Zoo is a citizen science application in the Zooiniverse collection of projects. Citizen science is a special category of applications that uses the power of crowds to solve complex science problems that cannot be easily solved by algorithms or computers. There are many different citizen science apps that you can try out on Zooniverse if you want to learn more about this field.

Galaxy Zoo asks its users to classify various pictures of galaxies from pictures from the Sloan Digital Sky Survey, the Cerro Tololo Inter-American Observatory, and the VLT Survey Telescope. Starting in 2007, this project has been so successful that it actually spurred the creation of the entire Zooinverse site. In fact, the Galaxy Zoo team has written 55 different papers from the data they have gathered from the project.

As an example of what they have discovered using crowd-generated data, the team created a new classification of galaxy based on the observation of the citizen scientists. After the workers found a pattern of pea-like entities in many galaxy pictures, the team looked closer at those formations. They found that the formations were essentially young “star factory” galaxies that created new stars much more quickly than older, more established galaxies.

Also, it’s interesting to note that the project started because a professor assigned a grad student to classify 1 million pictures of galaxies. After a grueling 50,000 classifications of these pictures done by one person, the student and professor came up with a solution to leverage the crowd to get this data set organized.

You can also create your own project on Zooniverse to take advantage of their over 1 million “zooite” user base. This is best used for massive datasets that need to be worked on manually. It also uses both intrinsic and extrinsic motivations for users through the benefit of science and giving each user a “score” on how many classifications they have performed.

Demo:

  1. Go to the Zooniverse website.
  2. Register a new account.
  3. Click on “Projects” on the top menu bar to see all of the citizen science apps available. Note that you can also search by category, which is useful if you want to work on a particular field.
  4. To work on specifically Galaxy Zoo, start typing “galaxy zoo” in the name input box on the right side of the screen (under the categories scroll bar).
  5. Click on “Galaxy Zoo” in the auto-complete drop down.
  6. Click on “Begin Classifying.”
  7. Perform classifications! This involves answering the question about the galaxy in the box next to the picture. It may also be helpful at this step to click on “Examples” to get more information about these galaxies.

Read More

Motivation Factors in Crowdsourced Journalism: Social Impact, Social Change, and Peer Learning

Paper:

Aitamurto, T. (2015). Motivation Factors in Crowdsourced Journalism: Social Impact, Social Change, and Peer Learning. International Journal of Communication9(0), 21. http://ijoc.org/index.php/ijoc/article/view/3481/1502

Discussion Leader:  Rifat Sabbir Mansur


Summary:

Crowdsourcing journalism has recently become a more common knowledge-search method among professional journalists where participants contribute to journalistic process by sharing their individual knowledge. Crowdsourcing can be used for both participatory, where the crowd contributes raw materials to a process run by a journalist, and citizen journalism, ordinary people adopt the role of journalist. Unlike outsourcing where the task is relied on a few known experts or sources, crowdsourcing opens up the tasks for anybody to participate voluntarily or for monetary gain. This research paper tries to explain crowd’s motivation factors for crowdsourced journalism based on social psychological theories by asking the two following questions:

  • Why do people contribute to crowdsourced journalism?
  • What manifestations of knowledge do the motivation factors depict?

The author tries to seek the motivation factors in crowdsourcing, commons-based peer production, and citizen journalism using self-determination theory in social psychology. According to the theory human motivations are either intrinsic, done for enjoyment or community-based obligation, or extrinsic, done for direct rewards such as money. The author reviews various papers where she found both forms of motivation factors for crowdsourcing, commons-based peer production, and citizen journalism. In the later part of the paper, the author introduces four journalistic processes which use crowdsources, conducted in-depth interviews with many of the participators in the crowdsourcing, and processed her findings based on that.

The cases the author uses are as following:

  1. Inaccuracies in physics schoolbooks
  2. Quality problems in Finnish products and services
  3. Gender inequalities in Math and Science Education
  4. Home Loan Interest Map

The first 3 stories were published in various magazines and the later one was conducted on Sweden’s leading daily newspaper. The first 3 stories were further categorized to Case A since the same journalists worked on all three stories. The 4th story used a crowdmap where the crowd submitted information about their mortgages and interest rates online. The information were then geographically located and visualized. It became very popular breaking online traffic records for the newspaper.

The author conducted semi-structured interviews with 22 participants in Case A and 5 online participants in Case B. The interview data were then analyzed by Strauss and Corbin’s analytical coding system. With these analyzed data the author presented her findings.

The author posits that based on her findings the main motivation factors for participating in crowdsourced journalism are as follows:

  • Possibility of having an impact
  • Ensuring accuracy and adding diversity for a balanced view
  • Decreasing knowledge and power asymmetries
  • Peer learning
  • Deliberation

The findings from the author shows that the motivations above are mostly intrinsic. Only the motivation for peer learning the participants expressed to have the desire to learn from others’ knowledge and practice their skill, making it extrinsic in nature with its intrinsic nature of a better understanding of others. None of the participants expected any financial compensation for their participation. They rather found themselves rewarded as their voices were heard. The participants also believed monetary compensation could lead to false information. The participation in crowdsourced journalism is mainly volunteer in nature having altruistic motivations. The intrinsic factors in the motivations mostly derived from the participators’ ideology, values, social and self-enhancement drives.

The nature of crowdsource journalism are to some extent different from commons-based peer production, citizen journalism, and other crowdsourcing contexts as the former have neither career advancement nor reputation enhancement unlike the later. Rather the participants perceive journalism as being a part to social change.

The author brings the theories of digital labor abuse and refutes it by showing results suggesting the argument does not fit to the empirical reality of online participation.

The author finally discusses about several limitations of the research and scope for future research using larger samples, numerous cases, and empirical contexts in several countries including both active and passive participators of the crowd.

 

Reflection:

The study in the paper had profound social psychological analysis on the motivations of the participators in crowdsourcing. Unlike prior researches the paper involves itself with motivation factors in voluntary-based crowdsourcing i.e. crowdsourcing without pecuniary rewards. The author also tried to address the motivations on crowdsourcing, commons-based peer production, and citizen journalism separately. This allowed her to dig deeper into the intrinsic and extrinsic drives of the motivations. The author also further classifies intrinsic motivations into 2 factors, such as, enjoyment-based and obligation/community-based.

The study revealed some very interesting points. As the author mentions that having an impact drives participation. This is one of the main motivations in the crowd participators. One specific comment:

“I hope my comment will end up in the story, because we have to change the conditions. Maybe I should go back to the site, to make sure my voice is heard, and comment again?”

I find this comment very interesting because this shows nature of the intrinsic motivation and the unwritten moral obligation the participators feel towards their topic of interest. Here, the participator’s motivation is clearly to bring social change.

Another interesting factor, in my opinion, is that volunteering involves sharing one’s fortune (e.g., products, knowledge, and skills) to protect oneself from feeling guilty about being fortunate. The author mentions this as the protective function that drives volunteer work.

In my opinion, one of the clear good sides of crowd participation is developing a more accurate picture of the topic and offering multiple perspectives. The role of filling the knowledge gaps in a particular topic in the journalists’ understanding helps build a more accurate and full picture. It also provides a check for yellow journalism. This also allows participators to contribute with multiple perspectives creating diverse views about controversial topics.

What I found interesting is that the participants did not expect financial compensation for their participation in crowdsourcing. On the contrary they believed if this effort was monetarily compensated, it could actually be dangerous and skew participation. However, with pecuniary rewards the tasks draws a different group of crowd who are more aware of the responsibilities. This might actually encourage people to be more attentive participators and more aware about their comments/remarks.

Another interesting notion of the paper is that, the participants in this study did not expect reciprocity in the form of knowledge exchange. This characteristic, in my opinion, could arise the situation of firmly holding onto one’s false belief. The fact that the participators want to be a part of a social change they can be disheartened if their volunteer efforts were not appropriately addressed in the journalistic process.

I liked the author’s endeavor to address the differences and similarities between motivations in crowdsourced journalism.  In crowdsourced journalism, the crowd contributes only small pieces of raw material to a journalist to consider in the story process. Cumulatively, it can produce a bigger picture of an issue. By this way participators of the crowdsourcing can be a contributing part of a social change with their respective atomic inputs.

The limitations of the study, however, has great significance. The author mentions that it is possible that only those participants who had a positive experience with crowdsourcing accepted the interview request for the study. This might have caused the motivations in the study to be more intrinsic and altruistic in nature. With a different and widespread sample, the study might reveal some more interesting factors of human psychology.

 

Questions:

  1. What do you think about the differences between voluntary and reward-based crowdsourcing in terms of social impact?
  2. What do you think about the effects of citizen journalism on professional media journalism?
  3. Given the limitations, do you the case studies had adequate data to back up its findings?
  4. What do you think the future holds about the moderation of crowdsourcing?
  5. The study suggests a wide variety of crowd contribution like crowd-mapping, citizen journalism, common-based peer production, etc. How do you think we can develop systems to better handle the crowd’s contribution?

Read More

Emerging Journalistic Verification Practices Concerning Social Media

Paper:
Brandtzaeg, P. B., Lüders, M., Spangenberg, J., Rath-Wiggins, L., & Følstad, A. (2016). Emerging Journalistic Verification Practices Concerning Social Media. Journalism Practice, 10(3), 323–342.
https://doi.org/10.1080/17512786.2015.1020331

Discussion Leader: Md Momen Bhuiyan

Summary:
Social media contents have recently been used widely as a primary source of news. In United States 49 percent of the people get breaking news from social media. One study found that 96 percent of the UK journalists use social media everyday. This paper tries to characterize journalistic values, needs and practices concerning the verification process of social media content and sources. Major contribution of this paper is the requirement analysis from a user perspective for the verification of social media content.

The authors use a qualitative approach to find answers to several questions like how journalists identify contributor, how they verify content and what the obstacles for verification are. By interviewing 24 journalists working with social media in major news organizations in Europe they divided verification practices into five categories. Firstly, if a content is published by a trusted source like popular news organization, Police, fire department, politician, celebrity etc. , they are usually considered reliable. Secondly, journalists use social media to get in touch with eyewitnesses. The reliability of the eyewitness is verified by checking if a trusted organization follows him and by their previous record. They also have to check if there are conflicting stories. However, journalists prefer to use traditional methods like direct contact with people. Furthermore, for multimodal contents like text, picture, audio, video etc. they usually use different tools like Google, NameChecker, Google Reverse Image Search, TinEye, Google Maps, Streetview etc. But they have huge gap in knowledge about these tools. Finally if they cannot verify a content they use workaround like disclaimers.

By looking into user group characteristics of the journalists and their context the authors find several potential user requirements for verification tools. They need efficient and easy to use tool to verify content. It has to organize huge amount of data and make sense of them. They also need it to be integrated into their current workflow. The tool need to offer high-speed verification and publication and accessibility from different types of devices. Another requirement is that the journalists need to understand how verification takes place. Furthermore, it needs to support verification of multimodal contents.

Finally the authors discuss limitations for both the study sample and findings. In spite of limitations this study provides a valuable basis for requirement for verification process of social media content.

Reflection:
Although the study made good contribution regarding requirements of verification tools for news organizations, it has several short comings. The study sample was taken from several countries and several organizations, but they don’t include any major organizations. Which begs the question how does major organizations like BBC, CNN, AP, Reuters verify social media contents? How do they define trusted sources? How do they follow private citizen?

The study also doesn’t make much comparison between younger and older journalists and how thier verification process differs. It was noted that young and female journalists have better experience with technologies. But the study doesn’t look if there are differences in thier respective verification process. All in all, further research is necessary to address these question.

Questions:
1. Can verification tools help gain public trust in news media?
2. What are the limitations of verification tools for multimodal content?
3. Can AI automate verification process?
4. Can journalism be replaced by AI?

Read More

The Verification Handbook

Paper:

Chapters 1, 3, 6, and 7 of:
Silverman, C. (Ed.). (2014). The Verification Handbook: A Definitive Guide to Verifying Digital Content for Emergency Coverage. Retrieved from The Verification Handbook
Discussion Leader: Lawrence Warren
 
Summary:
“This book is a guide to help everyone gain the skills and knowledge necessary to work together during critical events to separate news from noise, and ultimately to improve the quality of information available in our society, when it matters most.”

Chapter 1: When Emergency News Breaks

 This section of the book dealt with the perpetuation of rumors whenever a disaster strikes. According to the 8 1/2 Laws of Rumor Spread it is easy to get a good rumor going when we are already anxious about a situation. This problem existed long before the current world of high speed networks and social media and has become a serious thorn in the sides of information verification associates. People at times intentionally spread false rumors at times to be apart of the hot topic and to bolster attention to a social media account or cause which adds yet another layer of problems for information verification. This epidemic is intensified during actual times of crisis when lives hang within the balance of having the correct information. One would think the easiest way to verify data is for professionals to be the ones to disperse information, but the problem is that many times an eye witness will observe a situation long before an actual journalist, and at times a journalist may not have access to the things which are seen first hand. People rely on official sources to provide accurate information in a timely fashion while simultaneously those agencies rely on ordinary people to help source information as well as bring it to context.

Chapter 3: Verifying User Generated Content

The art of gathering news has been transformed by two significant developments; mobile technology and the ever developing social network. In 2013 it was reported that over half of phones sold were smartphones which meant several ordinary people had the capability of recording incidents and taking them to any number of media outlets to be shared with the world. People normally send things to social media as many do not understand the process of handing something off to a news station and they feel more comfortable within their own network of chosen friends. It is also for this same feeling of security why people normally tune into social media during a breaking news update, which is where some people are fed fake news reports because of malicious users intentionally creating fake pages and sites to create a buzz around false facts. Then there are people who find content and claim it as their own which makes it harder to find the original sources at times of inspection. Verification is a skill which all professionals must have in order to help prevent fake news from circulating and it involves 4 items to check and confirm:

  1. Provenance: Is this the original piece of content?
  2. Source: Who uploaded the content?
  3. Date: When was the content created?
  4. Location: Where was the content created?

Chapter 6: Putting the Human Crowd to Work

Crowd sourcing is by no means a new concept and has always been a part of information gathering, but with the rise of social media dynamos, we can now do this on a much larger scale than before. This section of the book lists a few of the best practices for crowd source verification.

Chapter 7: Adding the computer Crowd to the Human Crowd

This section of the book is about the possibility of automating the verification process of information. Advanced computing (human computing and machine computing) is on the rise as machine learning becomes more advanced. Human computing has not yet been used to verifying social media information but with the direction of technology it is not too far away. Machine computing could be used to create verification plug-ins which would help to verify if an entry is likely to be credible.

Reflections:

The book does a good job of trying to be a centralized guideline for information verification in all aspects of the professional world. If all people and agencies used these guidelines then I believe it would remove a great deal of misinformation and would save time of any emergency efforts trying to assist. Decreasing the number of fake reports would help increase productivity of people who are actually trying to help.

This collection of ideals and practices run under the umbrella that professionals do not purposely spread false rumors because they are ethically not supposed to do so. We have seen very extreme views given by several news anchors and show hosts, mostly built on self opinion and have had no backlash or repercussions for what they say. It is my belief that as long as there are people involved in information distribution, there is no real way to stop misinformation from being spread. Ultimately as long as there is a person with an opinion behind some information gathering or distribution it will be impossible to eradicate fake news reports, or even embellished stories.

 Questions:
  • What can we do as individuals to prevent the spread of false reports within our social networks?
  • There is a debate on the effectiveness of algorithms and automated searches against the human element. Will machines ever completely replace humans?
  • Should there be a standard punishment for creating false reports or are the culprits protected by their 1st amendment rights? Are there any exceptions to your position on that idea?
  • Verification is a difficult job that many people work together to get accurate information. Can you imagine a way (other than automation) to streamline information verified?

Read More

Integrating On-demand Fact-checking with Public Dialogue

Paper:

Kriplean, T., Bonnar, C., Borning, A., Kinney, B., & Gill, B. (2014). Integrating On-demand Fact-checking with Public Dialogue. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing (pp. 1188–1199). New York, NY, USA: ACM. https://doi.org/10.1145/2531602.2531677 (Links to an external site.)

Discussion Leader: Sukrit V

Summary:

This article aims to understand the design space for inserting accurate information into public discourse in a non-confrontational manner. The authors integrate a request-based fact-checking service with an existing communication interface, ConsiderIt – a crowd-sourced voters guide. This integration involves the reintroduction of professionals and institutions – namely, librarians and public library systems – that have up to now been largely ignored, into crowdsourcing systems.

The authors note that existing communication interfaces for public discourse often fail to aid participants in identifying which claims are factual. The article first delves into different sources of factually correct information and the format in which it should be conveyed to participants. They then discuss who performs the work of identifying, collating and presenting this information: either professionals or via crowdsourcing. Lastly, where this information is presented by these people is crucial: through single function entities such as Snopes or Politifact, embedded responses, or overlays in chat interfaces.

Their system was deployed in the field during the course of a real election with voluntary users – the Living Voters Guide (LVG), and utilized librarians from the Seattle Public Library (SPL) as the fact-checkers.Initial results indicated that participants were not opposed to the role played by these librarians. One key point to note is the labeling of points post verification: accurate, unverifiable and questionable. The term “questionable” was specifically chosen since it is more considerate of users’ feelings – as opposed to the negative connotation associated with “wrong” or a red X.

The rest of the article discusses balancing the task of informing LVG users of which pro/con points were factual but in a non-confrontational manner. The decision to prompt a fact-check was in the hands of the LVG participants and the fact-check was performed only on the factual component of claims and presented in an easy-to-assess manner. From the perspective of the SPL librarians, they played a crucial role in determining the underlying features of the fact-checking mechanism.

In the results, the authors were successfully able to determine that there was a demand for a request-based fact-checking service, and that the SPL librarians were viewed and welcomed as trustworthy participants which simultaneously helped improve the credibility of the LVG interface. Based on Monte Carlo simulations, the authors demonstrate that there was an observable decrease in commenting rates before and after fact-checking, having taken into account temporal effects.

Conclusively, the authors note that the journalistic fact-checking framework did not interface well with librarian referencing methods. In their implementation, there was also no facility for enhanced communicating between the librarians, the user whose point was being checked, and the requester. The method in which the fact-checks were displayed tended to dominate the discussion section and possibly caused a drop in comment rates. Some librarians were of the opinion that they were exceeding their professional boundaries when determining the authenticity of certain claims – especially those pertaining to legal matters.

Reflections:

The article made good headway in creating an interface to nudge people towards finding a common ground. This was done through the use of unbiased professionals/institution vis-à-vis librarians and the Seattle Public Library, in a communication interface.

The involvement of librarians – who are still highly trusted and respected by the public – is notable. These librarians help the LVG participants finding verified information on claims, amidst a deluge of conflicting information presented to them by other users and on the internet.  One caveat – that can only be rectified through changes in existing laws – is that librarians cannot perform legal research. They are only allowed to provide links to related information.

On one hand, I commend the efforts of the authors to introduce a professional, unbiased fact-checker into a communication system filled with (possibly) misinformed and uninformed participants. On the other, I question the scalability of such efforts. The librarians set a 48-hour deadline on responding to requests, and in some cases it took up to two hours of research to verify a claim. Perhaps this system would benefit from a slightly tweaked learnersourcing approach utilizing response aggregation and subsequent expert evaluation.

Their Monte Carlo analysis was particularly useful in determining whether the fact-checking had any effect on comment frequency, versus temporal effects alone. I also appreciate the Value Sensitive Design approach the authors use to evaluate the fact-checking service from the viewpoint of the main and indirect stakeholders. The five-point Likert scale utilized by the authors also allows for some degree of flexibility in gauging stakeholder opinion, as opposed to binary responses.

Of particular mention was how ConsiderIt, their communication interface, utilized a PointRank algorithm which highlights points that were more highly scrutinized. Additionally, the system’s structure inherently disincentivizes gaming of the fact-checking service. The authors mention triaging requests to handle malicious users/pranksters. Perhaps this initial triage could be automated, instead of having to rely on human input.

I believe that this on-demand fact-checking system shows promise, but will only truly be functional at a large scale if certain processes are automated and handled by software mechanisms. Further, a messaging interface wherein the librarian, the requester of the fact-check, and the original poster can converse directly with each other would be useful. Perhaps  that would defeat the purpose of a transparent fact-checking system and eschew the whole point of a public dialogue system. Additionally, the authors note that there is little evidence to show that participants short-term opinions changed. I am unsure of how to evaluate whether or not these opinions change in the long-term.

Overall, ConsiderIt’s new fact-checking feature has considerably augmented the LVG user experience in a positive manner and integrated the work of professionals and institutions into a “commons-based peer production.”

Questions:

  • How, if possible, would one evaluate long-term change in opinion?
  • Would it be possible to introduce experts in the field of legal studies to aid librarians in the area of legal research? How would they interact with the librarians? What responsibilities do they have to the public to provide “accurate, unbiased, and courteous responses to all requests”?
  • How could this system be scaled to accommodate a much larger user base, while still allowing for accurate and timely fact-checking?
  • Are there certain types of public dialogues in which professionals/institutions should not/are unable to lend a hand?

Read More

Amazon Mechanical Turk

Technology: Amazon Mechanical Turk (MTurk)

Demo Leader: Leanna Ireland

Summary:

Amazon Mechanical Turk (MTurk) is a crowdsource platform which connects requesters (researchers, etc.) to a human workforce to complete tasks in exchange for money. Requesters as well as workers can be located all over the world.

Requesters provide tasks and compensation for the workers. The tasks, or human intelligence tasks (HITs), can range from identifying photos and transcribing interviews to writing reviews and taking surveys. When creating a task, requesters can specify worker requirements, such as the number of HITs undertaken, percentage of HITs approved, or a worker’s location. Other qualifications can be specified but for a fee. These options include: US political affiliation, education status, gender, and even left handedness.

Requesters set the monetary reward. Many HITs on MTurk are actually set to a relatively low reward (e.g., US $0.10). Some workers choose to pass over work with low payment; however, others will complete the low paying rewards to increase their HIT approval rates. Requesters pay workers based on the quality of their work. They approve or reject the work completed by workers and thus, if a worker’s work is rejected, the monetary reward is not given.

Overall, MTurk is an inexpensive rapid form of data collection, often resulting in participants more representative of the general population than other Internet and student samples (Buhrmester et al. 2011). However, MTurk participants can vary from the general population. Goodman and colleagues (2013) found that compared to community samples MTurk participants pay less attention to experimental materials, for example. In addition, MTurk raises some ethical issues with the often low rewards for workers. Completing three twenty-minute tasks for 1.50 a piece, for example, does not allow workers to meet many mandated hourly minimum wages.

Because MTurk is a great source for data collection, MTurk can also be used in nefarious ways. This could include being asked to take a geotagged photo of the front counter of your local pharmacy. This innocent-enough task could help determine local tobacco prices or could discover the location and front counter security measures of a store. In addition, requesters could crowdsource paid work for lower value or even crowdsource class assignments to the US population, such as the demo below…

Research about MTurk:

Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon’s Mechanical Turk a new source of inexpensive, yet high-quality, data?. Perspectives on psychological science6(1), 3-5.

Goodman, J. K., Cryder, C. E., & Cheema, A. (2013). Data collection in a flat world: The strengths and weaknesses of Mechanical Turk samples. Journal of Behavioral Decision Making26(3), 213-224.

Demo:

  1. Go to the MTurk
  2. You will be presented with two options: to either become a worker or a requester. Click ‘Get Started’ under the ‘Get Results from Mechanical Turk Workers’ to become a requester.
  3. To become a requester, for the purposes of the demo, click the ‘get started’ button under the ‘Work Distribution Made Easy’ option.
  4. You will be asked to sign-in to create a project on this page. Browse through the various available options on the left column. Requesters can launch surveys, ask workers to choose an image they prefer, or ask them their sentiments of a tweet. Simply click ‘sign in to create project’.
  5. You will then be asked to sign in or set up an amazon account.
  6. After signing into your account, you will be brought to a page with the following tabs: home, create, manage, developer, and help tabs.
  7. To create a new project, click ‘create’ and then ‘new project’ directly below. You will now need to select the type of project you wish to design from the left column. I choose ‘survey link’ as I will be embedding a link from Qualtrics (web-based survey tool) for the purposes of this demonstration, so the following instructions are for the survey link option. The survey is asking a question from our previous week’s discussion: “What do you think is a bigger problem — incorrect analysis by amateurs, or amplifying of false information by professionals?”
  8. After you have indicated your choice of project, click the ‘create project’ button.
  9. Under the ‘Enter properties’ tab, provide a project name as well as description for your HIT to the Workers. You will also need to set up your HITs. This includes indicating the reward per assignment, number of assignments per (how many people you want to complete your task), the time allotted per assignment, HIT expiration, and the time window before payments are auto-approved. Lastly, you need to indicate worker requirements (e.g., location, HIT approval rate, number of HITs approved).
  10. Under the design layout, you can use the HTML editor to layout the HIT (e.g., write up the survey instructions as well as provide the survey link).
  11. You then can preview the instructions. After you have completed setting up your HIT, you will be taken back to the create tab where your new HIT is listed. To publish the batch, simply click ‘Publish Batch’. You then need to confirm payment and publish.
  12. To view the results and to allocate payment, click ‘Manage’ and download CSV. To approve payment, place an X under the column ‘Approve’. To reject payment, place an X under the column ‘Reject’. This CVS file is then uploaded to MTurk, where approvals and rejections are processed and payment is disbursed to the workers.
  13. To download results from MTurk, under the ‘manage’ tab, click ‘results. And download the CSV. OR, you can download the results from the platform you are using (e.g., Qualtrics).
  14. Lastly, there is an entire community forum for MTurk workers entitled Turker Nation. Workers share tips and techniques and discuss all things MTurk and more (e.g., what HITs to complete but also which HITs or requesters to avoid). This can be a useful site to further advertise your HITs.

Read More

TinEye

Technology:

TinEye Reverse Image Search

Demo leader: Kurt Luther

Summary:

TinEye is a web-based tool for performing reverse image searches. This means you can start with an image (instead of text keywords) and search for websites that include that image, or ones similar to it. TinEye provides a web interface for quick searches, but it also provides an API for programmatic use of the tool, so that developers can integrate its functionality into their own software. Google and other major search engines provide reverse image functionality, but to my knowledge, TinEye is unique in also providing a powerful API.

Reverse image search is useful for tasks like determining where and when an image was first posted, where it has spread to, etc. This tool can help investigators determine if an image has been modified from the original, if it is being presented in an incorrect context, or if it is being used without proper permissions, among other possibilities.

Reverse image search can also provide extra context or detail when it isn’t desired. For example, people have used the tool to reveal the private identities of profile pictures on online dating sites.

Demo:

  1. Find an image you’d like to search. I picked a photo of a shark that is widely circulated during natural disasters. Recently, some people claimed this photo showed a flooded highway in Houston during Hurricane Harvey.
  2. Go to the TinEye website.
  3. In the search box, you have three main choices. You can 1) upload an image saved on your computer, 2) paste a URL that directly links to the image you’re searching, or 3) paste a URL of the web page containing the image. If the latter, the next page will ask you to pick which image from that page you want to search.
  4. Here are my search results. As of this writing, TinEye found 318 similar images after searching over 21 billion images across the web.
  5. The drop-down menu on the left lets you change how the results are sorted.
    • By default it’s “best match” (I think this means most visually similar).
    • “Oldest” is useful for finding the original source of the image. The oldest version TinEye found is from reallyfunnystuff.org in 2012.
    • “Most changed” shows some of the ways the image has been modified. For example, sometimes it’s cropped, or text is superimposed on it.
    • “Biggest image” is good for finding a high-quality version.
  6. The “filter” textbox lets you filter the sources of image results. When you click this textbox, it will auto-suggest some popular domains. For example, this particular image appeared in seven different BuzzFeed articles, some going back to 2014.
  7. You can also filter results to “collections”. These seem to be popular sources of online images like Flickr or Wikipedia that might give you more information about the image or allow you to license it for your own use.
  8. You can easily compare your image to any in the search results. Click “Compare Match” under the thumbnail of that image. Click “Switch” in the popup that appears and you can quickly toggle between both versions of the image.

Read More

#BostonBombing: The Anatomy of a Misinformation Disaster

Paper:

Madrigal, A. C. (2013, April 19). #BostonBombing: The Anatomy of a Misinformation Disaster. The Atlantic. Retrieved from http://www.theatlantic.com/technology/archive/2013/04/-bostonbombing-the-anatomy-of-a-misinformation-disaster/275155/

Discussion leader: Kurt Luther

Summary:

This news article in The Atlantic seeks to better understand how two innocent people were widely circulated in social media as being suspects in the Boston Marathon Bombing. When the bombing took place, law enforcement took several days to identify the correct suspects. Meanwhile, online communities in Reddit, Twitter, 4chan, and elsewhere attempted to follow the investigation and even contribute to it. Ultimately, two entirely different people were named as the correct suspects.

The author primarily uses Twitter archives, audio recordings of police scanners, searches of the relevant Reddit forum, some informal interviews, and knowledge of the broader context surrounding the event, as the data sources for the article. The author traces one suspect, Mulugeta, to a name mentioned by law enforcement on a police scanner. This was incorrectly transcribed as “Mike Mulugeta” (“Mike” was just a spelling aid) by an amateur tweet that the author tracked down. The author tracks down the origin of the other suspect, Sunil Tripathi, to a tweet posted by another amateur, Greg Hughes. Hughes claimed to have heard the name on a police scanner, but no evidence of that has been found in any police scanner recordings that day. Despite this, many sleuths the author interviewed claimed to have heard it.

According to the author, Hughes’ tweet, which mentioned both suspects, appeared to be the source of an information cascade that led to these suspects’ names being widely reported. One key factor seems to have been several members of the news media, such as a CBS cameraman and a Buzzfeed reporter, that quickly retweeted the original tweet. This led to more mainstream media broadcasting as well as Anonymous further spreading the misinformation.

Only the identification of a different set of suspects (the correct ones) by NBC reporter Brian Williams led to an end of the propagation of misinformation. The author concludes by pondering how the Sunil Tripathi name entered the conversation, since there is no evidence of it mentioned by officials on police scanners or elsewhere. The author speculates Hughes could be mistaken or the scanner recordings could be incomplete. The author concludes by noting that many different parties, including amateurs and professionals, were partly responsible for the mistake.

Reflections:

This article does a nice job of digging deeper into a key question surrounding the failure of online sleuthing in the Boston Marathon Bombing. That question is, how did two innocent people get named as suspects in this attack?

Ultimately, the author is only able to get a partial answer to that question. One name was indeed mentioned by police, though misheard and transcribed incorrectly, and ultimately that person wasn’t involved. The author is able to track down both the recording of the police scanner and the tweet showing the incorrect transcription.

The other name is harder to explain. The author tracks down a tweet that he believes is responsible for promulgating the false information, and does a convincing job of showing that it originated the claims that eventually made it to the mainstream news. However, it’s still not clear where the author of that tweet got his misinformation, if it was a mistake (good faith or not), or we’re still missing key evidence. The author acknowledges this is frustrating, and I agree.

This article is effective in illustrating some of the strengths and limitations of tracing the path of crowdsourced investigations. Sometimes the information is readily available online, by searching Twitter archives or digging up recordings of police scanners. Sometimes the researcher has to dig deeper, interviewing potential sources (some of whom don’t respond), browsing forum threads, and scrubbing through long audio recordings. Sometimes the data simply is not available, or may never have existed.

As a journalist, the author has a different methodological approach than an academic researcher, and perhaps more flexibility to try a lot of different techniques and see what sticks. I think it’s interesting to think about whether an academic’s more systematic, but maybe constrained, methods might lead to different answers. I think the question the author originally poses is important and deserves an answer, if it’s possible with the surviving data.

Related to this, a minor methodological question I had was how the author could be sure he identified the very first tweets to contain the misinformation. I haven’t done large scale data analysis of Twitter, but my understanding is the amount of data researchers have access to has changed over time. In order to definitively say which tweets were earliest, the researcher would need to have access to all the tweets from that time period. I wonder if this was, or still is, possible.

Questions:

  • How could we prevent the spread of misinformation as described here from happening in the future?
  • What do you think is a bigger problem — incorrect analysis by amateurs, or amplifying of false information by professionals?
  • The author notes that some people apologized for blaming innocent people, and others deleted their tweets. What is an appropriate response from responsible parties when amateur sleuthing goes wrong?
  • Suspects in official investigations are often found to be innocent, with few repercussions. Why do you think this crowdsourced investigation led to much more outrage?
  • The mystery of where Tripathi’s name came from remains unsolved. What other approaches could we try to solve it?

Read More

What’s the deal with ‘websleuthing’? News media representations of amateur detectives in networked spaces

Paper:

Yardley, E., Lynes, A. G. T., Wilson, D., & Kelly, E. (2016). What’s the deal with ‘websleuthing’? News media representations of amateur detectives in networked spaces. Crime, Media, Culture, 1741659016674045. https://doi.org/10.1177/1741659016674045

Discussion leader: Kurt Luther

Summary:

This article explores media representations (i.e. news coverage) of online amateur sleuths or websleuths. The article is written by criminologists and published in the Crime Media Culture journal, and its focus is specifically on websleuths seeking to solve crimes, as opposed to other types of investigations. The authors assert that this type of online activity is important but has seen insufficient research attention from the criminology research community.

The authors review related work in two main areas. First they review studies of amateur sleuths with respect to concepts like vigilantism and “digilantism” (online vigilantes). They acknowledge this research is fairly sparse. The second body of literature the authors review focuses on a perspective from cultural criminology. This work considers how websleuth activities intersect with broader notions of infotainment and participatory culture, and the authors make connections to popular crime television shows and radio programs.

The bulk of the article focuses on a content analysis of news articles on websleuthing. The authors employ a method called Ethnographic Content Analysis (ECA). They begin by gathering a corpus of 97 articles by searching keywords like “web sleuths” and “cyber detectives.” They read through the articles to identify key phrases regarding content and context, cluster them, and then finally perform quantitative analysis to illustrate frequency and proportion.

In the results, the authors provide a big-picture view of media coverage of websleuthing. They describe how coverage has increased over time and how most publications are non-US but cover activities that occur in the US. They characterize 17 types of crimes investigated by websleuths (homicide, property offences, and terrorism most common). They note a wide variety of 44 online spaces where websleuthing happens, with the popular social media platforms Facebook, Twitter, and Reddit foremost. They describe a rich variety of websleuthing practices and differentiate group and solo sleuths, as well as organized vs. spontaneous. They also discuss some motivations for websleuthing and some drawbacks, and note that little data was available on sleuth identities, though their professions seem to be diverse. Finally, they characterize websleuth interactions with law enforcement, noting that professional investigators often hesitate to acknowledge sleuth contributions.

In the discussion and conclusion, the authors note the tension between amateur and professional detectives despite increasing interaction between them. They also note that technology has broken down traditional boundaries allowing the public to more actively participate and crime solving. Finally, they note a similar blurring of boundaries between crime and entertainment enabled by popular media that increasingly invites its audience to participate actively in the experience.

Reflections:

This article had some nice strengths. It provides a nice overview of web sleuthing that’s focused on the domain of crime solving. As the authors note, this was a broader area than I expected. The authors provided many good examples, both during the literature review and in reporting results, of web sleuthing events and communities that I wasn’t previously familiar with.

Not being very familiar with criminology research, I thought it was interesting that the authors found this topic had not yet received enough research attention in that community. Although my own interest in web sleuthing is broader than crime, I appreciated the clear focus of this article. The choice to focus the analysis on news coverage provided a convenient way to give the reader a sense of how this phenomenon has impacted broader society and what about it is considered newsworthy. This was a helpful perspective for me, as I am approaching this phenomenon as a crowdsourcing researcher so my interests may be different from what others (i.e. the public) care about.

I admired the methods the authors used to conduct their analysis. I wasn’t previously familiar with ECA, though I’ve employed similar methods like grounded theory analysis and trace ethnography. ECA seems to offer a nice mixed-methods balance, providing flexibility to present both qualitative and quantitative results, which gives the reader a sense of overall patterns and trends as well as rich details.

I found many of the results interesting, but a few stood out. First, I thought the distinctions between organized and spontaneous web sleuthing, as well as solo vs. group investigations, were quite helpful in broadly differentiating lots of different examples. At least in terms of news coverage, I was surprised how common the solo investigations were compared to group ones. Second, I was fascinated by the variety of sleuthing activities identified by the authors. The large number and variety were interesting per se, but I also saw these as promising stepping stones for potential technological support. For almost all of these activities, I could imagine ways that we might design technology to help.

The article provided some tantalizing details here and there, but overall it provided more of a bird’s eye view of websleuthing. I would have appreciated a few more examples for many of the analyses performed by the authors. For example, I’d like to know more about exactly how websleuths interacted with law enforcement, and examples of each of the sleuthing activities.

I also wondered about how often websleuths’ activities met with success. The authors discuss positive and negative portrayals of websleuthing in the media, but this seems different from whether or not sleuths appeared to have made a valuable contribution to an investigation. From this data is seems possible to give the reader a sense for how often this phenomenon achieves its goals, at least from the perspective of the sleuths themselves.

Questions:

  • What are some of the advantages and disadvantages of linking websleuthing to infotainment?
  • What websleuthing activities do you think are best suited for amateurs? Which do you think they might have the most trouble with?
  • Why do you think professional investigators like law enforcement might minimize or avoid referring to contributions made by websleuths?
  • Why do you think media portrayals of websleuthing were more positive with respect to property crimes than homicide?
  • The article notes a huge variety of online platforms that support websleuthing. What do you think are some essential features of these platforms that enable it?

Read More