TinEye

Technology:

TinEye Reverse Image Search

Demo leader: Kurt Luther

Summary:

TinEye is a web-based tool for performing reverse image searches. This means you can start with an image (instead of text keywords) and search for websites that include that image, or ones similar to it. TinEye provides a web interface for quick searches, but it also provides an API for programmatic use of the tool, so that developers can integrate its functionality into their own software. Google and other major search engines provide reverse image functionality, but to my knowledge, TinEye is unique in also providing a powerful API.

Reverse image search is useful for tasks like determining where and when an image was first posted, where it has spread to, etc. This tool can help investigators determine if an image has been modified from the original, if it is being presented in an incorrect context, or if it is being used without proper permissions, among other possibilities.

Reverse image search can also provide extra context or detail when it isn’t desired. For example, people have used the tool to reveal the private identities of profile pictures on online dating sites.

Demo:

  1. Find an image you’d like to search. I picked a photo of a shark that is widely circulated during natural disasters. Recently, some people claimed this photo showed a flooded highway in Houston during Hurricane Harvey.
  2. Go to the TinEye website.
  3. In the search box, you have three main choices. You can 1) upload an image saved on your computer, 2) paste a URL that directly links to the image you’re searching, or 3) paste a URL of the web page containing the image. If the latter, the next page will ask you to pick which image from that page you want to search.
  4. Here are my search results. As of this writing, TinEye found 318 similar images after searching over 21 billion images across the web.
  5. The drop-down menu on the left lets you change how the results are sorted.
    • By default it’s “best match” (I think this means most visually similar).
    • “Oldest” is useful for finding the original source of the image. The oldest version TinEye found is from reallyfunnystuff.org in 2012.
    • “Most changed” shows some of the ways the image has been modified. For example, sometimes it’s cropped, or text is superimposed on it.
    • “Biggest image” is good for finding a high-quality version.
  6. The “filter” textbox lets you filter the sources of image results. When you click this textbox, it will auto-suggest some popular domains. For example, this particular image appeared in seven different BuzzFeed articles, some going back to 2014.
  7. You can also filter results to “collections”. These seem to be popular sources of online images like Flickr or Wikipedia that might give you more information about the image or allow you to license it for your own use.
  8. You can easily compare your image to any in the search results. Click “Compare Match” under the thumbnail of that image. Click “Switch” in the popup that appears and you can quickly toggle between both versions of the image.

Read More

#BostonBombing: The Anatomy of a Misinformation Disaster

Paper:

Madrigal, A. C. (2013, April 19). #BostonBombing: The Anatomy of a Misinformation Disaster. The Atlantic. Retrieved from http://www.theatlantic.com/technology/archive/2013/04/-bostonbombing-the-anatomy-of-a-misinformation-disaster/275155/

Discussion leader: Kurt Luther

Summary:

This news article in The Atlantic seeks to better understand how two innocent people were widely circulated in social media as being suspects in the Boston Marathon Bombing. When the bombing took place, law enforcement took several days to identify the correct suspects. Meanwhile, online communities in Reddit, Twitter, 4chan, and elsewhere attempted to follow the investigation and even contribute to it. Ultimately, two entirely different people were named as the correct suspects.

The author primarily uses Twitter archives, audio recordings of police scanners, searches of the relevant Reddit forum, some informal interviews, and knowledge of the broader context surrounding the event, as the data sources for the article. The author traces one suspect, Mulugeta, to a name mentioned by law enforcement on a police scanner. This was incorrectly transcribed as “Mike Mulugeta” (“Mike” was just a spelling aid) by an amateur tweet that the author tracked down. The author tracks down the origin of the other suspect, Sunil Tripathi, to a tweet posted by another amateur, Greg Hughes. Hughes claimed to have heard the name on a police scanner, but no evidence of that has been found in any police scanner recordings that day. Despite this, many sleuths the author interviewed claimed to have heard it.

According to the author, Hughes’ tweet, which mentioned both suspects, appeared to be the source of an information cascade that led to these suspects’ names being widely reported. One key factor seems to have been several members of the news media, such as a CBS cameraman and a Buzzfeed reporter, that quickly retweeted the original tweet. This led to more mainstream media broadcasting as well as Anonymous further spreading the misinformation.

Only the identification of a different set of suspects (the correct ones) by NBC reporter Brian Williams led to an end of the propagation of misinformation. The author concludes by pondering how the Sunil Tripathi name entered the conversation, since there is no evidence of it mentioned by officials on police scanners or elsewhere. The author speculates Hughes could be mistaken or the scanner recordings could be incomplete. The author concludes by noting that many different parties, including amateurs and professionals, were partly responsible for the mistake.

Reflections:

This article does a nice job of digging deeper into a key question surrounding the failure of online sleuthing in the Boston Marathon Bombing. That question is, how did two innocent people get named as suspects in this attack?

Ultimately, the author is only able to get a partial answer to that question. One name was indeed mentioned by police, though misheard and transcribed incorrectly, and ultimately that person wasn’t involved. The author is able to track down both the recording of the police scanner and the tweet showing the incorrect transcription.

The other name is harder to explain. The author tracks down a tweet that he believes is responsible for promulgating the false information, and does a convincing job of showing that it originated the claims that eventually made it to the mainstream news. However, it’s still not clear where the author of that tweet got his misinformation, if it was a mistake (good faith or not), or we’re still missing key evidence. The author acknowledges this is frustrating, and I agree.

This article is effective in illustrating some of the strengths and limitations of tracing the path of crowdsourced investigations. Sometimes the information is readily available online, by searching Twitter archives or digging up recordings of police scanners. Sometimes the researcher has to dig deeper, interviewing potential sources (some of whom don’t respond), browsing forum threads, and scrubbing through long audio recordings. Sometimes the data simply is not available, or may never have existed.

As a journalist, the author has a different methodological approach than an academic researcher, and perhaps more flexibility to try a lot of different techniques and see what sticks. I think it’s interesting to think about whether an academic’s more systematic, but maybe constrained, methods might lead to different answers. I think the question the author originally poses is important and deserves an answer, if it’s possible with the surviving data.

Related to this, a minor methodological question I had was how the author could be sure he identified the very first tweets to contain the misinformation. I haven’t done large scale data analysis of Twitter, but my understanding is the amount of data researchers have access to has changed over time. In order to definitively say which tweets were earliest, the researcher would need to have access to all the tweets from that time period. I wonder if this was, or still is, possible.

Questions:

  • How could we prevent the spread of misinformation as described here from happening in the future?
  • What do you think is a bigger problem — incorrect analysis by amateurs, or amplifying of false information by professionals?
  • The author notes that some people apologized for blaming innocent people, and others deleted their tweets. What is an appropriate response from responsible parties when amateur sleuthing goes wrong?
  • Suspects in official investigations are often found to be innocent, with few repercussions. Why do you think this crowdsourced investigation led to much more outrage?
  • The mystery of where Tripathi’s name came from remains unsolved. What other approaches could we try to solve it?

Read More

What’s the deal with ‘websleuthing’? News media representations of amateur detectives in networked spaces

Paper:

Yardley, E., Lynes, A. G. T., Wilson, D., & Kelly, E. (2016). What’s the deal with ‘websleuthing’? News media representations of amateur detectives in networked spaces. Crime, Media, Culture, 1741659016674045. https://doi.org/10.1177/1741659016674045

Discussion leader: Kurt Luther

Summary:

This article explores media representations (i.e. news coverage) of online amateur sleuths or websleuths. The article is written by criminologists and published in the Crime Media Culture journal, and its focus is specifically on websleuths seeking to solve crimes, as opposed to other types of investigations. The authors assert that this type of online activity is important but has seen insufficient research attention from the criminology research community.

The authors review related work in two main areas. First they review studies of amateur sleuths with respect to concepts like vigilantism and “digilantism” (online vigilantes). They acknowledge this research is fairly sparse. The second body of literature the authors review focuses on a perspective from cultural criminology. This work considers how websleuth activities intersect with broader notions of infotainment and participatory culture, and the authors make connections to popular crime television shows and radio programs.

The bulk of the article focuses on a content analysis of news articles on websleuthing. The authors employ a method called Ethnographic Content Analysis (ECA). They begin by gathering a corpus of 97 articles by searching keywords like “web sleuths” and “cyber detectives.” They read through the articles to identify key phrases regarding content and context, cluster them, and then finally perform quantitative analysis to illustrate frequency and proportion.

In the results, the authors provide a big-picture view of media coverage of websleuthing. They describe how coverage has increased over time and how most publications are non-US but cover activities that occur in the US. They characterize 17 types of crimes investigated by websleuths (homicide, property offences, and terrorism most common). They note a wide variety of 44 online spaces where websleuthing happens, with the popular social media platforms Facebook, Twitter, and Reddit foremost. They describe a rich variety of websleuthing practices and differentiate group and solo sleuths, as well as organized vs. spontaneous. They also discuss some motivations for websleuthing and some drawbacks, and note that little data was available on sleuth identities, though their professions seem to be diverse. Finally, they characterize websleuth interactions with law enforcement, noting that professional investigators often hesitate to acknowledge sleuth contributions.

In the discussion and conclusion, the authors note the tension between amateur and professional detectives despite increasing interaction between them. They also note that technology has broken down traditional boundaries allowing the public to more actively participate and crime solving. Finally, they note a similar blurring of boundaries between crime and entertainment enabled by popular media that increasingly invites its audience to participate actively in the experience.

Reflections:

This article had some nice strengths. It provides a nice overview of web sleuthing that’s focused on the domain of crime solving. As the authors note, this was a broader area than I expected. The authors provided many good examples, both during the literature review and in reporting results, of web sleuthing events and communities that I wasn’t previously familiar with.

Not being very familiar with criminology research, I thought it was interesting that the authors found this topic had not yet received enough research attention in that community. Although my own interest in web sleuthing is broader than crime, I appreciated the clear focus of this article. The choice to focus the analysis on news coverage provided a convenient way to give the reader a sense of how this phenomenon has impacted broader society and what about it is considered newsworthy. This was a helpful perspective for me, as I am approaching this phenomenon as a crowdsourcing researcher so my interests may be different from what others (i.e. the public) care about.

I admired the methods the authors used to conduct their analysis. I wasn’t previously familiar with ECA, though I’ve employed similar methods like grounded theory analysis and trace ethnography. ECA seems to offer a nice mixed-methods balance, providing flexibility to present both qualitative and quantitative results, which gives the reader a sense of overall patterns and trends as well as rich details.

I found many of the results interesting, but a few stood out. First, I thought the distinctions between organized and spontaneous web sleuthing, as well as solo vs. group investigations, were quite helpful in broadly differentiating lots of different examples. At least in terms of news coverage, I was surprised how common the solo investigations were compared to group ones. Second, I was fascinated by the variety of sleuthing activities identified by the authors. The large number and variety were interesting per se, but I also saw these as promising stepping stones for potential technological support. For almost all of these activities, I could imagine ways that we might design technology to help.

The article provided some tantalizing details here and there, but overall it provided more of a bird’s eye view of websleuthing. I would have appreciated a few more examples for many of the analyses performed by the authors. For example, I’d like to know more about exactly how websleuths interacted with law enforcement, and examples of each of the sleuthing activities.

I also wondered about how often websleuths’ activities met with success. The authors discuss positive and negative portrayals of websleuthing in the media, but this seems different from whether or not sleuths appeared to have made a valuable contribution to an investigation. From this data is seems possible to give the reader a sense for how often this phenomenon achieves its goals, at least from the perspective of the sleuths themselves.

Questions:

  • What are some of the advantages and disadvantages of linking websleuthing to infotainment?
  • What websleuthing activities do you think are best suited for amateurs? Which do you think they might have the most trouble with?
  • Why do you think professional investigators like law enforcement might minimize or avoid referring to contributions made by websleuths?
  • Why do you think media portrayals of websleuthing were more positive with respect to property crimes than homicide?
  • The article notes a huge variety of online platforms that support websleuthing. What do you think are some essential features of these platforms that enable it?

Read More