Exploring Privacy and Accuracy Trade-Offs in Crowdsourced Behavioral Video Coding

Paper:

Walter S. Lasecki, Mitchell Gordon, Winnie Leung, Ellen Lim, Jeffrey P. Bigham, and Steven P. Dow. 2015. Exploring Privacy and Accuracy Trade-Offs in Crowdsourced Behavioral Video Coding. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI ’15). ACM, New York, NY, USA, 1945-1954. DOI: https://doi.org/10.1145/2702123.2702605

Discussion Leader: Sukrit V

Summary:

Social science and interaction researches often need to review video data to learn more about human behavior and interaction. However, computer vision is still not advanced enough to automatically detect human behavior. Thus, video data is still mostly manually coded. This is a terribly time-intensive process and often takes up to ten times the length of a video to code it.

Recent work in the crowdsourcing space has been successful in using crowds to code for behavioral cues. This method, albeit much faster, introduces other concerns.

The paper provides a brief background of how video data is typically coded and current research in the space of video annotation using crowds. The authors interviewed twelve practitioners who have each coded at least 100 hours of video to obtain a better understanding of current video coding practices and what they believe are the potential benefits and concerns of utilizing crowdsourcing in a behavioral video coding process. From the interviews, they deduced that video coding is a time-consuming process and is used in a wide variety of contexts and to code for a range of behaviors. Even developing a coding schema is difficult due to inter-rater agreement requirements. The researchers were open to the idea of using online crowds as part of the video coding service, but they had concerns with the quality and reliability of the crowds, in addition to maintaining participant privacy and meeting IRB requirements.

The paper details an experimental study to explore the ability and accuracy of crowds to code for a range of behavior, and how obfuscation methods would affect a worker’s ability to identify participant behavior and identity. From the first experiment, they were able to obtain relatively high precision and recall rates for coding a range of behaviors, except for smiling and head turning. This was attributed to a lack of clarity on the instructions and example provided for those two. In the second experiment, they varied the blur level of the videos and observed that the decay rate in personally identifiable information dropped more steeply than the F1 rates did. This means that, it is easier to preserve privacy at higher blur levels and still maintain relatively good levels of precision and recall.

The authors also created a tool, Incognito, for researches to test what level of privacy protection filters are sufficient for their use case and what impact it would have on the accuracy of the crowdsourced coding. They conclude with a discussion of future work: utilizing different approaches to filtering, and performing real-world usage studies.

 

Reflection:

The paper is rather well organized, and the experiments were quite straightforward and well-detailed. The graphs and images present were sufficient.

I quite liked the ‘lineup tool’ that they utilized at the end of each video coding task and mimicked what is used in real life. In addition, their side-experiment to determine whether the workers were better at identifying participants if they were prompted beforehand, is something that I believe is useful to know and could be applied in other types of experiments.

I believe the tool they designed, Incognito, would prove extremely useful for researchers since it abstracts the process of obfuscating the video and hiring workers on MTurk. However, it would have been nice if the paper mentioned what instructions the MTurk workers were given on coding the videos. In addition, perhaps training these workers using a tutorial may have produced better results. Also, they noted that coding done by experts is a time-consuming process and the time taken to do so linearly correlates with the size of the dataset. Something that would be interesting to study is how the accuracy of coding done by the crowd-sourced workers would change with increased practice over time. This may further reduce the overhead of the experts, provided that coding standards are maintained.

Furthermore, the authors mention the crowdsourced workers’ precision and recall rates, but it would be nice if they had looked into the inter-rater agreement rates as well since that plays a vital role in video coding.

For coding smiles they used an unobfuscated window of the mouth, and the entire study focuses on blurring the whole image to preserve privacy. I wish they had considered – or even mentioned – using facial recognition algorithms to blur only the faces (which I believe would still preserve privacy to a very high degree), yet greatly increase the accuracy when it comes to coding other behaviors.

Overall, this is a very detailed, exploratory paper in determining the privacy-accuracy tradeoffs when utilizing crowdsourced workers to perform video coding.

 

Questions:

  1. The authors noted that there was no change in precision and recall at blur levels 3 and 10 when the workers were notified that they were going to be asked to identify participants after their task. That is, even when they were informed beforehand about having to perform a line-up test, they were no better or worse at recognizing the person’s face (“accessing and recalling this information”). Why do you think there was no change?
  2. Can you think of cases where using crowdsourced workers would not be appropriate for coding video?
  3. The aim of this paper was to protect the privacy of participants in the videos that needed to be coded, and was concerned with their identity being discovered/disclosed by crowdsourced workers. How important is it for these people’s identities to be protected when it is the experts themselves (or for example, some governmental entities) that are performing the coding?
  4. How do you think the crowdsourced workers compare to experts with regards to coding accuracy? Perhaps it would be better to have a hierarchy where these workers code the videos and, below a certain threshold level of agreement between the workers, the experts would intervene and code the videos themselves. Would this be too complicated? How would you evaluate such a system for inter-rater agreement?
  5. Can you think of other approaches – apart from facial recognition with blurring, and blurring the whole image – that can be used to preserve privacy yet utilize the parallelism of the crowd?
  6. Aside: How do precision and recall relate to Cohen’s kappa?

Read More

Demo: Kali Linux

Technology: www.kali.org

Demo Leader: Lawrence

Disclaimer: Though possible, it is currently illegal to preform hacks, crack, and penetration testing on any network or system which you do not own. The purpose of this OS is to assist in personal testing as to protect against adversaries.

Summary:

Kali Linux is a Debian-based Linux distribution aimed at advanced Penetration Testing and Security Auditing. Kali contains several hundred tools which are geared towards various information security tasks, such as Penetration Testing, Security research, Computer Forensics and Reverse Engineering. Released in March 2013, Kali Linux comes complete with several hundred pre-installed penetration tools including but not limited to injection testing, password cracker, GPS packages, vulnerability analysis, sniffing and spoofing. There is a detailed list on the website if you would like to browse them.

Reflection:

Kali Linux was created as an offensive strategy to allow users to effectively test security in their homes and on there own devices. It is specifically designed to meet the requirements of professional penetration testing and security auditing which is why it is made slightly different than the average OS. It is not recommended for anyone looking for a general operation OS of even to be functional outside of penetration testing as there are a very limited number of repositories which are trusted to work on Kali. While Kali Linux is made with a high level of customization, so you will not be able to add random unrelated packages and repositories and have it work without a fight. In particular, there is absolutely no support whatsoever for the apt-add-repository command, LaunchPad, or PPAs. Trying to install popular programs such as Steam on your Kali Linux desktop will not be something you will be able to do more than likely. Even for experienced Linux users, Kali can pose some challenges. Although Kali is an open source project, it’s not a wide-open source project, for reasons of security. It is a rule of this OS that not knowing what you are doing is no excuse for doing irreversible damage to a system, so use at your own risk.

How To Use:

As Kali is natively compatible with ARMEL and ARMHF boards i will be displaying the process by which to install onto a Raspberry Pi 3 Model B.

  1. Obtain raspberry pi kit. There are several options depending on your likes.
  2. Obtain a fast micro SD card of at least 8 GB capacity.
  3. Download the special Kali Raspberry Pi2 image from the downloads area.
  4. Image this file to the SD card (be extremely careful as this step can erase your hard drive if you select the wrong drive to flash).
  5. Viola! You may now purchase a wireless adapter capable of wireless injection for testing, or just run the tools using your given install wireless card.

Read More

Visual Representations of Disaster

Paper:

Bica, M., Palen, L., & Bopp, C. (2017). Visual Representations of Disaster. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (pp. 1262–1276). New York, NY, USA: ACM.

Discussion Leader:

Tianyi Li

Summary:

This paper investigated the representation of the two 2015 Nepal earthquakes in April and May, via images shared on Twitter in three ways:

  1. examine the correlation between geotagged image tweets in the affected Nepali region and the distribution of structural damage
  2. investigate what images the Nepali population versus the rest of the world distributed
  3. investigate the appropriation of imagery into the telling of the disaster event

The authors combined both statistical analysis and content analysis in their investigation.

The first question aims to understand if the photos distributed on social media correlates to the actual structural damage geographically, and if such distribution can measure the disaster accurately. They found that the distribution is significantly correlated statistically, however, a more in-depth analysis revealed that the geotags mean relatively little in relation to the finer aspects of geography and damage in Nepal: the images are less frequently photos of on-the-ground activity, and more frequently infographics and maps that originate from news and media.

The second question aims to understand how local and global audience perceive and relate to the disaster. They defined the boundary between “local” and “global” by the usage of Nepali, and pided the tweets into two dimensions: time (after first vs. after second) and geolocation (local vs. global), resulting in four categories (globally-sourced-after-first, globally-sourced-after-second, locally-sourced-after-first, and locally-sourced-after-second). The analysis results from hand-coding the Top 100 most retweeted image tweets in each of the four categories show a different diffusion of content, with locals focusing more on the response and the damage the earthquake caused in their cities, and the global population focused more on the images of people suffering. After the second earthquake, the results suggest some disaster fatigue for those not affected by the events and celebrity attention becomes the new mechanisms for maintaining the world’s gaze upon Nepal.

The third question studies the two competing expectations: journalistic accuracy and drawing a collective gaze of photography. They found four images in the full dataset of 400 image tweets that were confirmed to be appropriated from other times and/or places in globally-sourced-after-first, and an additional image in locally-sourced-after-first that has ambiguous origins. The fact that those images are appropriated is acknowledged either through replies or the originaltweets “as an honest mistake”, as “another way of visually representing disaster and garnering support through compelling imagery”.

Reflections:

This paper investigated imagery contents on tweets about the two earthquakes in Nepal in 2015, as a example to analyze disaster representation on social media. I appreciate the thorough analysis taken by the author. They looked at the tweets with image contents with three different research questions in mind, and studied different dimensions of such tweets as well as their correlations:  between geotags and contents, between local and global audiences, and between actual pictures of the disaster and appropriated images from other time and/or locations.

Their analysis is comprehensive and convincing, in that they combine complete and strong statistical analysis and in-depth content analysis. This well structured analysis revealed that despite the strong statistical correlation between geotags and images, the pictures posted are mostly infographics from news media which doesn’t really tell much about the “objective” depict of the on-the-ground situations of those affected places. It is a good example of the importance of content analysis and future researchers should not settle with superficial statistical phenomena and should try to look at the data closely.

Especially, I found their way of defining the boundary of local vs. global by language smart. In spite of the acknowledged limitation mentioned in the paper, languages can effectively distinguish the audience of the tweets. Taken into consideration that Nepali is the only official language, it’s an efficient and smart way of taking advantage of this.

Questions:

  1. Do you agree with the authors that “…therefore, to watch is to witness. It is a responsibility to watch, even when watching is nevertheless entangled with problems of objectification and gratitude for not being a victim oneself.” is true in a global, geographically distributed online community? In other words, what do you think of the value of a “collective gaze” in the Internet era?
  2. What do you think of image appropriation? Is it acceptable to draw wider range of attention from a broader, global community, or is it journalistically inaccurate and irresponsible thus not acceptable?
  3. What do you think of the authors’ way of distinguishing “local” and “global” via language? Can you come up with a better boundary?
  4. In their hand-coded categories of tweet contents, they acknowledged that “relief & recovery rescue” also contains “people suffering”. Why do they not merge these two categories? What do you think of their coding schema?

Read More

Demo:WiiBrew

Technology: Wiibrew.org

Demo Leader: Lawrence

Disclaimer: Doing actions such as this can void the warranty on your system and allow you to do things which are not legal under The Digital Millennium Copyright Act (DMCA). I do not endorse nor encourage these actions be taken and would advise against any of it if you are not sure what you are doing.

Summary:

Wiibrew is a special hack you can do to a Nintendo Wii Game System. First, you run one of several available exploits the the exploit crashes the Wii and runs unsigned code in the form of a .dol/.elf file on the root of your SD Card. All you need is an SD card which is bigger than 256MB and a game title from the list provided at the wiki page or you can run a sequence called letterbomb to do that hack if you do not have one of the listed games. The website makes sure to discourage against piracy and will not troubleshoot issues when trying to use a pirated game to do an exploit. The act of installing Wiibrew and using the intended apps is not inherently illegal, the issue comes when people use the unlocked systems to attempt to play ROMs. It is currently illegal to play non authentic copies of games even if you own the original copy and got it legally.

Reflections:

Wiibrew is a way to unlock the full potential of your Wii system and even allows you to play DVD movies (can not on a standard system). The problem is once this is done your Wii is now technically a computer system and has the same capabilities as any other home computer for the most part. You can even load a special version of Linux which is compatible with all Wii peripherals as well as a standard mouse and keyboard while also maintaining the dvd drive functionality.

Quick comparison to super portable computers.

Nintendo Wii Intel Compute Stick Acer Chromebook
Price $79.99 $258.99 $249.99
CPU Power-PC Core m3 Intel Celeron
# of Cores 1 2 4
CPU Speed 729MHz 1.6 GHz 1.6 GHz
GPU ATI Hollywood Integrated Integrated
GPU Speed 243 MHz N/A N/A
RAM 2GB 4GB 4GB
Disk Drive 512MB 64GB 32GB
Disc Drive Yes No No
Card Reader Yes SDXC Yes SDXC Yes
LAN No No No
WLAN Yes Yes Yes
USB 2.0 2 1 1
USB 3.0 0 1 1

How to install using Letterbomb:

  1. Go to please.hackmii.com
  2. Select your region
  3. Enter your MAC Address from the Wii into the boxes
  4. Make sure the HackMii installer is bundled with your download (check box)
  5. Cut the red wire to download the .zip file and unzip it
  6. Copy “private” folder and “boot.elf” to your SD Card (recommended 4GB non SDHC)
  7. Insert the SD card to your Wii and go to Wii Messageboard
  8. You should see a red envelope

Read More

Journalists as Crowdsourcerers: Responding to Crisis by Reporting with a Crowd

Article: “Journalists as Crowdsourcerers: Responding to Crisis by Reporting with a Crowd:” https://link.springer.com/article/10.1007%2Fs10606-014-9208-z

Summary

This article is about professional journalists in a community deeply affected by Hurricane Irene, and how their role developed from that of typical journalists to leaders of a crowdsourced communication and self-advocacy movement among a population in crisis. This process was shaped by the rural surroundings (the Catskill Mountains of upstate New York) of the journalists and the community they served. In this case,  the extenuating problems of uneven IT infrastructures were deeply implicated in the way they developed ad-hoc, on-the-fly content, as well as the development of provisional ethical guidelines about journalism and reporting. A major innovation from this team is their conceptualization of “human powered mesh network,” which introduces humans as “nodes” in a peer-to-peer mesh IT infrastructure.

Social media became central to the emphasis on humans in the network. In their explanation of the role of social media in emergency situations, it becomes clear that the concept of “human infrastructure” — of which the human-powered mesh network is a subcategory — could not exist without social media. Platforms that connect individuals across the Internet and gain power vis-á-vis network effects create the conditions for this “human infrastructure.”

The authors give a detailed account of how Hurricane Irene affected the Catskills region before turning to an exploration of how local journalists used the Internet during this time. They describe the use of live-blogging — real time reporting and data-sharing by journalists who had started an online local news site called The Watershed Post. The platform that developed in the wake of Hurricane Irene out of a real-time news feed on the Watershed Post — powered by software called CoverItLive and simply called “the Liveblog” — became essential to the dissemination of emergency-response information. In their methodology section, which followed the situational description, they explain that they arrived at a blend of qualitative analyses of journalist interviews and the digital record of the Liveblog.

In both the Liveblog and through speaking with the journalists, the importance of social media and amplification of the message via popular platforms like Twitter is evident. The Watershed Post editors established a presence for their Liveblog on Twitter At one point, this message is posted out on the Liveblog:

 If we lose power during the storm, this live feed will continue to automatically pull in Twitter updates from other local newspapers, but we’ll be unable to post (for obvious reasons). Cross your fingers.

And since they did indeed lose power, this became critical. The redundant, always-on chain of communication supported by social media infrastructure allowed the Liveblog to balloon out. Guest comments on the blog also rose to prominence as a major source of information. It got to the point that moderating these comments became a task in and of itself, and moderators began to assume the role of public-facing authorities and gatekeepers in a situation where accuracy is of the essence.

At the end, the authors note a discrepancy between the presumptions of HCI researchers and the way information flowed organically in this case (and, by extrapolation, how it might flow in similar situations — they note the Virginia Tech tragedy in 2007 as an analogous example). Resisting strong fidelity to one side or another led them toward the hybrid concept of the human-powered mesh network: insights from both can inform our thinking on this in our roles journalists, citizens, technologists, and (more often than not these days ) some blend of the three.

Analysis

Prefatory note: this hit pretty close to home for me, because I was living in the Catskill Mountains when Hurricane Irene happened, and my roommate was working on building mesh networks. He was constantly frustrated by the geographical barriers to peer-to-peer networking (i.e., mountains!). Also, his work was disrupted when we had to evacuate our apartment…

This isn’t just a gratuitous anecdote. The human/”meatspace”  (including geographical/topographical/old-infrastructure concerns) factor in IT innovations often gets left out of the conversation when we look at their revolutionary potential. Yet in this case (as is often the case with non-IT inventions), where crisis was the mother of invention, the need to design for complex and worst-case scenarios gave traction to the development of a concept (human-powered mesh networks). This has a wide span of applications — and will possibly become more important as mobile technology proliferates, and facility with disaster management and response may become more important  (if we believe the science on climate change).

That’s why I think crowdsourcing is a rich topic: it is, basically, a technological concept that needs human actors en masse in order to work. With that will always come required considerations that emphasize multiplicity. Although the developers behind some major softwares don’t have to design their tech to be as accessible as possible, we can’t talk about (for example) mTurk without discussing ethics and accessibility, since it needs to work for lots of different kinds of people. Likewise, human-powered mesh networks place unique requirements on the way we think about human/physical-world factors. In emphasizing the crowd, the multiple, it necessarily becomes a more just conversation.

In a case where accuracy could mean life or death, trust is absolutely essential. The authors indicate this toward the end of their discussion. Although we have, in this class, looked toward automated means of fact-checking and verification, I’d like to propose something a bit different. The writers make this observation:

“Human infrastructuring in this case also included developing, adapting and communicating shared practices, and establishing a shared sense of ownership within the collaboration.”

With a shared sense of ownership comes (at least in theory) a shared responsibility to be accountable for the impact of the information you spread. The human-powered mesh network, and those who adopt roles of relative power within it — as comment moderators, contributors to maps, or those who Tweet/post a lot on blogs, etc — runs on the assumption of ethics and good faith among those who participate. Automating fact-checking and information accuracy is one thing, but in focusing on how we can give this role to computers, perhaps we forget that networks of humans — both before and after the digital turn — have a decent track record in spreading reliable information when it really matters.

Questions

  • How does infrastructure topography change the way we think about the spread of information? Does the fact that access to Internet necessities (like electricity, working computers, and of course Internet itself) varies globally make it impossible to have a global conversation about concepts like the ones proposed in this article?
  •  In times of crisis, would you rather automate fact-checking or rely on humans?
  • Since they discuss how useful the crowdsourced map was — are some forms of data representation better suited to crowdsourcing than others? For example, is it better (broadly construing “better;” it can  mean easier, more efficient, more accurate, and so on) to crowdsource a map, or other imagistic data representations, than textual information? Does this change based on the context — e.g, it may be more effective to crowdsource textual information not in a time of emergency?
  • What do we think of “infrastructure as a verb,” (“to infrastructure”), the notion that infrastructure is a constant, active process rather than a static object? What implications does this reframing have for HCI?

Read More

Its Not Steak and Lobster But Maybe It Can Become That!

Paper:

Gang Wang, Christo Wilson, Xiaohan Zhao, Yibo Zhu, Manish Mohanlal, Haitao Zheng, and Ben Y. Zhao. 2012. Serf and turf: crowdturfing for fun and profit

Discussion Leader:

Lawrence Warren

Summary:

Remarkable things can be done with the internet at your side but of course with great power comes great criminal activity. Yes it is true that crowd sourcing systems pose a unique threat to security mechanisms due to the nature of how security is approached and this paper points out the existence of malicious crowd sourcing systems. Due to the crowd sourcing systems astroturfing (refers to information dissemination
campaigns that are sponsored by an organization, but are obfuscated so as to appear spontaneous) nature, they are referred to as crowdturf systems and more specifically they are defined as systems where a customer can initiate a campaign and users receive money for completing tasks which go against accepted user policies. There are two types of crowdturfing structures described in the paper which are distributed and centralized and both of these structures need three key actors in order to operate (customers, agents, workers).

Image result for crowdturfing structure

Distributed structure is organized around small groups hosted by the group leader and is resistant to external threats and are easy to dissolve and redeploy but is not popular due to the fragmented nature and lack of accountability.

Centralized structure is more like Mechanical Turk and is more streamlined and popular because of this. It is also because of the open availability of system which allows for infiltration.

Well known crowdturfing sites are running strong and is a global issue  and several have adopted Paypal which extends their reach to more countries than the one they are run within.

Reflections:

This paper shows the darker side of crowd source work and the results are astonishing. Millions of dollars have been spent by companies in order to bypass spam filters in order to get more exposure and crowdturfing seems to be the new way to spread information. Sites like this are not afraid of legal backlash and have increased in popularity despite threats from law enforcement. Large information cascades have been pushed and have gone around most filters designed to catch automatized spam meanwhile it is the clicks from users which is the end goal.

Questions:

  • Will machine learning need to be advanced in order to filter out spam from crowdturfing?
  • Is there any fault of users when it comes to the success of the such systems?
  • Why do you think Weibo and ZBJ was used as opposed to twitter and Mechanical Turk (other than research location)?
  • In this growing industry are there any negative results which can be seen from a company’s point of view?

Read More

Crowd Powered Threat

Paper:

Lasecki, W. S., Teevan, J., & Kamar, E. (2014). Information Extraction and Manipulation Threats in Crowd-powered Systems

Discussion Leader:

Lawrence Warren

Summary:

In automated systems there is sometimes a gap which machine learning has not had the ability to overcome as of current technology standards. Crowd sourcing seems to be the popular solution in some of these cases since tasking can become cumbersome and overwhelming for a single individual to handle. Systems such as Legion: AR or VizWiz use human intelligence to solve problems and can potentially share sensitive information. This could lead to several issues if a single malicious person is used in a session. They can have access to addresses, phone numbers, birthdays, and in some cases can possibly extract credit card numbers. There is also the possibility of a user attempting to sway results in a specific way in the event the session is not incognito. This paper describes an experiment to see how likely it is that a user would have malicious intent and how likely is it that a person will pass on information they should not and another to see how likely a user would be to manipulate test results using a few Mechanical Turk surveys.

Reflections:

This paper brought up a few good issues as far as information security involving crowd sourced information. My biggest criticism of this paper would be that there were no innovative mitigations created or even possibilities mentioned to protect against the attacks. Machine learning was mentioned as a method to blank out possibly sensitive information but other than that this paper makes it seem as if there is no way to stop it other than removing the information from the view of the user. Finding reliable workers was mentioned as a solution but that entails interviews and finding people which removes the benefits of crowd sourcing the work. This paper though informative, in my opinion did not make any headway in providing an answer, nor did it actually dig up any new threats, it just listed ones which we were already aware of and gave generic solutions which are in no way innovative.

Questions:

  • This paper describes 2 different types of vulnerabilities to which a crowd powered system is vulnerable. Can you think of any other possible threats?
  • Is there any directions crowd sourced work can take to better protect individual’s information?
  • Crowd sourcing work is becoming increasingly popular in many situations, is there a way to completely remove either of the 2 two potential attack scenarios listed in this paper aside from automation?

Read More

Global Database of Events, Language, and Tone

Demo: Global Database of Events, Language, and Tone (GDELT)

Summary:

The Global Database of Events, Language, and Tone (GDELT) is an open-sourced service supported by Google Ideas. It is a massive index of news media from 1979 to today, including real-time data from across the world. Articles are machine-translated into English (if not already) and then algorithms are applied to identify various events, sentiment, people, locations, themes and much more. The coded metadata is then streamed and updated every 15 minutes.

To work with the data, GDELT Analysis Service provides various ways to visualize and explore the data, such as EVENT Exporter, EVENT Geographic Network, EVENT Heatmapper, EVENT Timeline, EVENT TimeMapper, GKG Network, GKG Network, GKG Country Timeline, GKG Exporter, GKG Geographic Network, and many others. Datasets can also be moved into Google BigQuery, a cloud data warehouse, to run SQL queries, or downloaded as raw data files in CSV formats.

One of the main advantage to GDELT is the collection of real-time data from around the world. This data is coded and openly available for all to use. Not only that, but GDELT Analysis Service provides easy-use visualization tools for people not as familiar with programming. However, GDELT, like many other applications, can be used for nefarious ways. For example, regimes could track and record political protests and potentially use GDELT’s data to predict future protests. This would be particularly problematic in countries that would otherwise lack the data collection capacity to do such monitoring on their own.

Demo:

As mentioned above, GDELT is capable of so many different things. The following demonstration will be for only one of its services.  Let’s explore a geographic heat map of protests that happened in Canada in the past year …

  1. Head to GDELT Analysis Service
  2. Click EVENT HeatMapper
  3. At this stage, you need to fill in your email address (so the results can be forwarded on to you) as well as the information of interest. For this example, let’s choose a start date of 10/24/2016 and an end date of 10/24/2017. Then we choose civilian for ‘recipient/victim (Actor2) Type’. The event location should be specified to Canada and the event code should be ‘protest’. We also want the number events as the location weighting because we are interested in number of unique events rather than number of news articles. Lastly, let’s choose interactive ‘Interactive Heatmap Visualization’ and ‘.CSV File’ for the output. Then click submit.
  4. Now you wait until the results show up on your metaphorical door step…
  5. And, magic! It appears. Now you can see the results as a CSV or the Heatmap by clicking either link. Let’s look at the HeatMap first. The slide bar is to adjust display thresholds. If we zoom in, we can see protests occurred in Southern Ontario, Toronto, Ottawa, Montreal, outside Quebec City and even some in Newfoundland and Labrador.
  6. The CSV provides us with the longitude, latitude, and place name. In addition, it provides the number of events. For example, Marystown, Newfoundland had four protests. I did a simple Google search to see what was happening and it appears that fishermen were protesting at the union office.

Again, this is only one of the many tools available on GDELT.

Read More

Demo: Truthfinder

Technology: Truthfinder.com
Demo leader: Md Momen Bhuiyan

Summary:
Truthfinder is a commercial website for people search in the US, especially for finding information about “long lost friend” or “scammers”. This is a fairly new website started on March, 2015. Anyone can search the website using a person’s name or a phone number. They claim to have millions of public records from local, state and federal databases as well as independent sources. Due to the Freedom of Information Act these public records are available to anyone but collecting these information from multiple sources is hard. So truthfinder like websites make it easy to search. There are two types of membership in this site: one is regular and the other is premium. Premium users get more information about a person like: Educational information, Current and former roommates, Businesses and business associates, Voter registration details, Possible neighbors, Traffic accidents, Weapons permits etc. Most of these information are just possible match and doesn’t guarantee the correctness.

Reflection:

The site has a list of purposes it can or can’t be used. But anyone can misuse this for the purpose of screening candidate, stalking etc. During search it shows the records might have various type of information which could be a way to scam people into registering in the site. The average review of the website is below 3 [1]. Some suggested the information was either not accurate or several years old or can be gathered from google search. Only thing that was better was the opting out option which is mandatory by the FTC [2]. Still this site can be used by the journalists for verification purpose.

How to use:
1. The website has 5 options for searching: People Search, Criminal Records, Court Records, Reverse Phone Search, Deep Web Search.
2. Other than the reverse phone lookup option all them require the name of the person along with an optional location information.
3. After the search button is clicked it will ask the gender of the individual.
4. Then shows some graphics with a list of things its searching and ask if the relevant person’s location information.
5. After that a list of possible matches are shown along with location, age information of the match.
6. If the option of open report is clicked for a matched name it will show some further processing, an alert that the information might potentially be embarrassing and agreement on usage policy
7. After that it will go to the page for user registration where you have to give your name, email address and payment information

The whole process takes longer than five minutes.

[1] https://www.highya.com/truthfinder-reviews
[2] https://www.ftc.gov/news-events/media-resources/protecting-consumer-privacy/enforcing-privacy-promises

Read More

‘I never forget a face!’

Article: Davis, J. P., Jansari, A., & Lander, K. (2013). ‘I never forget a face!’. The Psychologist26(10), 726-729.

Summary:

The authors summarize the existing literature about super-recognisers, or those individuals scoring high on face-recognition tests, into three broad areas: general information, police officers, and the general population. In the first section – general information – the authors provide information about super-recognisers and prosopagnosia. Prosopagnosia, or face blindness, is when someone loses their ability to recognize familiar faces (including their own). This disorder can either stem from brain damage or genetic inheritance, and about two percent of the population has it.

At the other end of the face recognition spectrum – or Gaussian distribution – are those that are superior at recognizing faces, with an uncanny ability for facial recognition. Davis and colleagues comment that the belief that the super-recognisers’ ability is limited to face recognitions indicates support for the face recognition spectrum (prosopagnosia at one end and super recognizers at the other). They continue to note that facial recognition by humans can be superior to machines because faces are dynamic which can trip up machines.

In the next section, Davis and colleagues focus more closely on the police and super-recognisers. The bulk of the research in this section stems from the Annual Conference of the British Psychology Society presentation “Facial identification from CCTV: Investigating predictors of exceptional face recognition performance amongst police officers” (2013) by Davis, Lander, & Evans. These authors asked police officers who were deemed super-recognisers various questions and tested them on their recall ability. Some of the findings indicate that officers’ families do not share the similar super-recogniser skills and that these officers based their identifications on distinctive facial features.  In addition, the officers that provided broad strategies for facial recognition did worse than those with more narrower strategies. Lastly, the super-recogniser officers did well on celebrity recognition tests.

In the section on the general population, the authors pull from an unpublished study by Ashok Jansari that deployed the Cambridge Face Memory Test at London’s Science Museum. Their preliminary findings indicate that face recognition falls within a Gaussian distribution and fewer than 10 people had super recognition scoring.  Davis and colleagues then presents individual differences that influences facial recognition tasks, such as introvert/extrovert personality, as well as different processing difficulties for some (holistic processing/ inverted faces/ whole-part effect).

Reflection:

I recommend taking the Cambridge Face Memory Test to see where you fall on the face recognition distribution. This is the same task that was given to the officers and general population to take in some of the mentioned studies.

Interestingly, since the article was published, the Metropolitan Police actually formed a team of super-recognisers. This team complements the millions of CCTVs scattered across London.  Although these recognizers are human, it adds another dimension of surveillance to the streets of London, something some are trying to pull back on.

The authors did mention the uniqueness of humans versus technology in facial recognition. In particular, humans might be better than machines at identifying dynamic changes in human faces. However, with increasing advancements in technologies (and specifically biometrics), I wonder, will more advanced AI facial recognition software make this new unit obsolete in the next ten years or so? Or, do humans have some unique ability to notice dynamic changes that machines will not be able to mimic?

Lastly, it was interesting that, in the presented research by Davis, Lander, & Evans (2013), mostly white male officers thought they identified more black than white offenders. The authors attempted to attribute this finding to extensive contact with certain minority groups based on their policing jurisdiction. However, it makes me wonder the role that racial discrimination plays in their identifications and if some in the super-recognizer units simply reinforce policing discrimination? I am curious on their verification methods. How do they ensure relatively accurate recognitions as well as to eliminate any potential for bias and discrimination?

Questions:

  • How does facial recognition surveillance by humans differ from facial recognition surveillance by machines?
    • And, is there one that is preferable?
    • Is one better at deterrence (both general and specific)?
    • What are the privacy/ethical implications with humans vs. machines?
  • Is there potential for reinforcing discriminatory policing practices with super-recogniser policing?
    • What verification methods would need to be put into place?
  • How can the general populations’ super-recogniser skills play into crowd-sourcing?

Read More