Where Should We Protect? Identify Potential Targets of Terroristic Attack Plan via Crowdsourced Text Analysis

December 20, 2017December 20, 2017 tianyi

Need, Approach, Benefit, Competition

The increasing volume of text datasets is challenging the cognitive capabilities of expert analysts to produce meaningful insights. Large-scale distributed agents like machine learning algorithms and crowd workers present new opportunities to make sense of big data. However, we must first overcome the challenge of modeling and guiding the overall process so that many distributed agents can meaningfully contribute to suitable components. Inspired by the sensemaking loop, collaboration models, and investigation techniques used in Intelligence Analysis community, we propose a pipeline to better enable collaboration among expert analysts, crowds, and algorithms. We modularize and clarify the components in the sensemaking loop so that they are connected via clearly defined inputs and outputs to pass intermediate analysis results along the pipeline, and can be assigned to different agents with appropriate techniques. We instantiate the pipeline with a location-based investigation strategies and recruited 134 crowd workers on Amazon Mechanical Turk to analyze the dataset. Our results show that the pipeline can successfully guide crowd workers to contribute meaningful insights that are helpful to solve complicated sensemaking challenges. This allows us to imagine broader possibilities for how each component could be executed: with individual experts, crowds, or algorithms, as well as new combinations of these, where each is best suited.

Result and Discussion

Overview: investigating potential target location of terroristic attack with the pipeline

Step1: Subtly Relevant Documents are Successfully Retrieved

7 documents out of 13 directly mentioned one or more key elements. The remaining 6 documents (3 relevant and 3 irrelevant) are rated by crowd workers in a 0-100 scale. A neutral threshold 50 leads to 11 documents as Step 1 Data Output (precision=90.1%, recall=100%).

Step2: Most Key Useful Information Pieces are Extracted

In Step 2, a total of 26 information pieces are extracted. 18 of the information pieces mentions the target location and the key elements. However, some important information pieces didn’t get extracted.

Step3: Accurate Tagging and Potential Target Identification

After Step 3, 18 out of all the 26 information pieces are tagged, including 5 location tags. One interesting finding is two different workers both identified a location “Tel Aviv” in the information piece “The same brief voice message was given in Arabic in each call. A translation of this message reads: “I will be in my office on April 30 at 9:OOAM. Try to be on time”. One of the workers even give very specific information “the lacation (location) is israel (Israel) at “Mike’s Place”, a restaurant in Tel Aviv”. This means the crowds connect their own knowledge to this analysis.

Step4: Reasonable Reasoning and Comparison

For each location recognized in Step 3, we organize the source information piece to form the profile of this location tag. Then we rank the locations by amount of evidence to prepare for a single-elimination competition. The final winning location is “North Bergen, New Jersey” (the last place the bomb was stored before transferred to the target location), the second place is New York Stock Exchange (the real answer), losing with only one vote.

Logical and Clear Narrative Presentation

Using the profile of “North Bergen, New Jersey”, workers created a narrative as follows:

“A C-4 plastic explosive bomb is to be detonated at 0900hrs on 30 April,2003, by a group of terrorists: Muhammed bin Harazi [alias Abdul Ramazi], Hani al Halak, Hamid Alwan [alias Mark Davis], Sahim Albakri [alias Bagwant Dhaliwal]. Hani al Hallak placed a C-4 plastic explosive bomb and set up a fire in his carpet shop in North Bergen in the early morning hours of April 26, 2003.”

Semantic Interaction for Visual Text Analytics

November 14, 2017December 12, 2017 tianyi

Paper:
Endert, A., Fiaux, P., & North, C. (2012). Semantic Interaction for Visual Text Analytics. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 473–482). New York, NY, USA: ACM.

Discussion Leader:
Tianyi Li

Summary:
Youtube Video:

An important part of visual analytics is the statistical models used to transform data and generate visualization (visual metaphor). Analysts manipulate the visualization by tuning the parameters in those statistical models in their information foraging process. When it comes to information synthesis, analysts most often externalize their thought process by spatial layout the information. However, there is a gap between the two processes in terms of the tools available, with most visual analytics tools only focus on one side or another.

This paper takes this opportunity and bridge the two processes via “semantic interaction”, which manipulate parameters of the statistical models by interpreting analyst’s interaction (mostly spatial) with the visualization. Such interaction relieves analysts from understanding the complicated statistical models so as to focus more on the semantical meanings of the text data under analysis. This also opens new design space for visual analytic interaction. The author demonstrated an example visual analytic tool they developed as a proof of concept.

The visual analytic tool, called ForceSPIRE, combines machine learning techniques and interaction analysis, to assist analysts in analyzing intelligence analysis text documents. The system focus on the document similarity, represented by entity overlap and the amount of user interaction happened (conceptually) and spatial layout (visually). User interactions will influence the values of encoded feature labels of documents, thus trigger updates on the visual layout to reflect their sensemaking progress.
ForceSPIRE encodes text documents via the following concepts:

Soft data: stored result of user interaction as interpreted by the system
Hard data: extracted from the raw textual information (e.g., term or entity frequency, titles, document length, etc.).
Each entity has an importance value
- initialized with tf-idf score
- updated by user interaction hit
- always normalized to 1
Each document has a mass
- The number of entities corresponds to the mass of each document a.k.a nodes in the force-directed graph model, where heavier nodes do not move as fast as lighter nodes.)

Reflection:
This paper is very well written and one of the cornerstone papers for my research area and project. Semantic interaction bridge the gap between information foraging and synthesizing, which relieved the burden of analysts to learn about complicated statistical models. In other words, it realized the purpose of those models, and a fundamental mission of visual analytics: to hide the unnecessary complexity of data processing details from the analysts so that they can achieve productive analysis via visual metaphors.

The way ForceSPIRE computes similarity is to start with “hard data”, which gives an initial spatial layout of documents. Then while users interact with the visualization, the result of user interaction (“soft data”) is stored and interpreted by the system to update the spatial layout in real-time. Entity is the smallest unit of operation and computation. The initial entity values are assigned by computing tf-idf scores. Then one hit of user interaction, i.e., the entity was included in a highlight, it was searched, it was in a note, etc. , will increase 10% of the value (thus reducing other entities importance value). Each entity has its importance value and the sum is always normalized to be 1. Document, which contains entities, is of one level higher granularity. They assign a mass to documents to account for the number of entities in it. Similarly to entities, documents also increase their mass by 10% with 1 hit of user interaction, i.e., text was highlighted, it was the result of a search hit, or if user added more entities through annotations.

This is an over simplified scenario just to prove the concept, and there are following works on this, like StarSPIRE, which released the assumptions of large screen and small number of documents.

Questions:

In ForceSPIRE, it supports “undo”, where user press ctrl-z and rewind to the previous state without the most recent interaction. The entity importance values and document mass values are all rewinded but the spatial layout is not. They also recommended the users to have small-distance document movement. Why not enable undo for document location as well? What is the reason for not doing this?
In ForceSPIRE, they focused on document “similarity”. In the second paper today, they analyze news content by “relevance” and “uniqueness”, which seems to me is a finer tuned breakdown of the concept of similarity: relevance is the similarity to the topic of interest, uniqueness is the negate of similarity. Do you agree with me? How do you think other “computational enablers”, borrowing the term from the second paper, can be used in semantic analysis?
In ForceSPIRE, semantic analysis is applied to intelligence analyses which has specific facts to discover or questions to answer. How do you think semantic analysis can be applied to journalism or news content analysis? How are these scenarios different? How would such difference influence our analysis strategy?

Visual Representations of Disaster

October 31, 2017December 12, 2017 tianyi

Paper:

Bica, M., Palen, L., & Bopp, C. (2017). Visual Representations of Disaster. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (pp. 1262–1276). New York, NY, USA: ACM.

Discussion Leader:

Tianyi Li

Summary:

This paper investigated the representation of the two 2015 Nepal earthquakes in April and May, via images shared on Twitter in three ways:

examine the correlation between geotagged image tweets in the affected Nepali region and the distribution of structural damage
investigate what images the Nepali population versus the rest of the world distributed
investigate the appropriation of imagery into the telling of the disaster event

The authors combined both statistical analysis and content analysis in their investigation.

The first question aims to understand if the photos distributed on social media correlates to the actual structural damage geographically, and if such distribution can measure the disaster accurately. They found that the distribution is significantly correlated statistically, however, a more in-depth analysis revealed that the geotags mean relatively little in relation to the finer aspects of geography and damage in Nepal: the images are less frequently photos of on-the-ground activity, and more frequently infographics and maps that originate from news and media.

The second question aims to understand how local and global audience perceive and relate to the disaster. They defined the boundary between “local” and “global” by the usage of Nepali, and pided the tweets into two dimensions: time (after first vs. after second) and geolocation (local vs. global), resulting in four categories (globally-sourced-after-first, globally-sourced-after-second, locally-sourced-after-first, and locally-sourced-after-second). The analysis results from hand-coding the Top 100 most retweeted image tweets in each of the four categories show a different diffusion of content, with locals focusing more on the response and the damage the earthquake caused in their cities, and the global population focused more on the images of people suffering. After the second earthquake, the results suggest some disaster fatigue for those not affected by the events and celebrity attention becomes the new mechanisms for maintaining the world’s gaze upon Nepal.

The third question studies the two competing expectations: journalistic accuracy and drawing a collective gaze of photography. They found four images in the full dataset of 400 image tweets that were confirmed to be appropriated from other times and/or places in globally-sourced-after-first, and an additional image in locally-sourced-after-first that has ambiguous origins. The fact that those images are appropriated is acknowledged either through replies or the originaltweets “as an honest mistake”, as “another way of visually representing disaster and garnering support through compelling imagery”.

Reflections:

This paper investigated imagery contents on tweets about the two earthquakes in Nepal in 2015, as a example to analyze disaster representation on social media. I appreciate the thorough analysis taken by the author. They looked at the tweets with image contents with three different research questions in mind, and studied different dimensions of such tweets as well as their correlations: between geotags and contents, between local and global audiences, and between actual pictures of the disaster and appropriated images from other time and/or locations.

Their analysis is comprehensive and convincing, in that they combine complete and strong statistical analysis and in-depth content analysis. This well structured analysis revealed that despite the strong statistical correlation between geotags and images, the pictures posted are mostly infographics from news media which doesn’t really tell much about the “objective” depict of the on-the-ground situations of those affected places. It is a good example of the importance of content analysis and future researchers should not settle with superficial statistical phenomena and should try to look at the data closely.

Especially, I found their way of defining the boundary of local vs. global by language smart. In spite of the acknowledged limitation mentioned in the paper, languages can effectively distinguish the audience of the tweets. Taken into consideration that Nepali is the only official language, it’s an efficient and smart way of taking advantage of this.

Questions:

Do you agree with the authors that “…therefore, to watch is to witness. It is a responsibility to watch, even when watching is nevertheless entangled with problems of objectification and gratitude for not being a victim oneself.” is true in a global, geographically distributed online community? In other words, what do you think of the value of a “collective gaze” in the Internet era?
What do you think of image appropriation? Is it acceptable to draw wider range of attention from a broader, global community, or is it journalistically inaccurate and irresponsible thus not acceptable?
What do you think of the authors’ way of distinguishing “local” and “global” via language? Can you come up with a better boundary?
In their hand-coded categories of tweet contents, they acknowledged that “relief & recovery rescue” also contains “people suffering”. Why do they not merge these two categories? What do you think of their coding schema?

Demo: Timeline JS — Easy-to-make, beautiful timelines.

October 15, 2017December 12, 2017 tianyi

Technology: Timeline JS

Demo leader: Tianyi Li

Summary:

TimelineJS is one of the open-source storytelling projects by Knight Lab from Northwestern University. The Lab develops easy-to-use information visualization techniques for journalism, storytelling and content on the internet.

Timeline JS is a Javascript-based tool that enables both non-programmers and technical people to build visually rich, interactive timelines. Beginners can create a timeline using a Google spreadsheet, whereas experts can customize and instantiate timeline on their own webpage. TimelineJS can be embedded as iframe on sites or blogs. You can create your own Timeline on their website and embed the URL in an iframe in your own website, or you can integrate Timeline using javascript in your own front-end code.

Advantage:

Timeline JS can be built on their website by modifying given template Google Spreadsheet. The simple API enables non-programmers to use the tool. Once the Google Spreadsheet is published, you do not have to re-publish it when you update the data. It will be automatically updated in your timeline. In addition, it supports media links from many external websites like youtube and twitter. Timeline JS can also be plugged into javascript and html codes for expert programmers, using JSON data input, or even edit their CSS styles to incorporate personalized design.

Possible misuse and Limitation:

I do not see any significant or unique possibility of misuse of this tool. Any tools that embed links and multi-media sources might be misused to embed malicious links that steals user information.

Also, according to the tips and tricks on the website, the optimal number of slides is around 20, so the tool cannot be scaled for big data time series visualization.

Walkthrough:

[OPTION 1] Using their website:

Tutorial video

Step1: create your spreadsheet

Build a new Google Spreadsheet using the template given on the website You’ll need to copy the template to your own Google Drive account by clicking the “Make a Copy” button.

Drop dates, text, and links to media into the appropriate columns.

Note: Don’t change the column headers, don’t remove any columns, and don’t leave any blank rows in your spreadsheet.

Step2: publish the spreadsheet to the web

Under the File menu, select “Publish to the Web.” In the next window, click the blue “publish” button. When asked, “Are you sure…?” click OK. Close the ‘Publish to the web’ window. Copy the URL for your Timeline from the browser’s address bar. It should look something like this: https://docs.google.com/spreadsheets/d/1xuY4upIooEeszZ_lCmeNx24eSFWe0rHe9ZdqH2xqVNk/edit#gid=0

Note: Disregard the URL that appears in the “publish to the web” window. It used to be used below, but changes to Google mean that you’ll get an error if you use it now.

Step3: generate your timeline

Copy/paste spreadsheet URL into the box given on the tutorial walkthrough at step3, you will also be able to configure optional settings there. This is to generate your timeline. (Make sure you’ve published the spreadsheet.)

Once you are done, click “enter” (or “return”) on your keyboard

Step4: share your timeline

The step 4 on their web page will give you both a shareable link and a line of code if you want to embed the timeline in an iframe. You can just click on the “Preview” button to test your timeline on the same page, or “Open preview in a new window”.

[OPTION 2] Using javascript

Documentation

If you want to integrate timeline on your own website, just load Timeline CSS style, javascript library. Then you will be able to create your own timeline either from Google spreadsheet or a JSON file, by constructing a Timeline object using their javascript library. You can also customize the css style.

Example Code:

<html>
 <head>
 <title>Timeline JS demo</title>
 <!-- There are three key things you need to include on your page to embed a timeline: -->

<!-- 1 A link tag loading the Timeline CSS -->
 <link title="timeline-styles" rel="stylesheet" href="https://cdn.knightlab.com/libs/timeline3/latest/css/timeline.css">

<!-- 2 A script tag loading the Timeline javascript -->
 <script src="https://cdn.knightlab.com/libs/timeline3/latest/js/timeline.js"></script>
 </head>
 <body>
 <div id='timeline-embed' style="width: 100%; height: 600px"></div>
 <!-- 3 A second script tag which creates a Timeline -->
 <script type="text/javascript">
 // option 1: use google doc
 // The TL.Timeline constructor takes at least two arguments:
 // the id of the Timeline container (no '#'), and
 // the URL to your JSON data file or Google spreadsheet.
 // the id must refer to an element "above" this code,
 // and the element must have CSS styling to give it width and height
 // optionally, a third argument with configuration options can be passed.
 // See below for more about options.
 function timeline_googleSpreadsheet(){
 timeline_GoogleSpreadsheet = new TL.Timeline('timeline-embed','https://docs.google.com/spreadsheets/d/1cWqQBZCkX9GpzFtxCWHoqFXCHg-ylTVUWlnrdYMzKUI/pubhtml');
 }
 // option 2: use json
 function timeline_json(){
 $.getJSON("timeline3.json", function(json){
 window.timeline = new TL.Timeline('timeline-embed', json);
 })
 }
 // timeline_json();
 // configureing options
 function timeline_option(){
 var additionalOptions = {
 start_at_end: true,
 default_bg_color: {r:0, g:0, b:0},
 timenav_height: 250
 }

timeline_option = new TL.Timeline('timeline-embed','https://docs.google.com/spreadsheets/d/1cWqQBZCkX9GpzFtxCWHoqFXCHg-ylTVUWlnrdYMzKUI/pubhtml', additionalOptions);
 }
 // Uncomment any one of the following three lines to try things out:
 // timeline_googleSpreadsheet();
 // timeline_json();
 // timeline_option();
 </script>
 </body>
 </html>

JSON file:

{
 "title": {
 "media": {
 "url": "//www.flickr.com/photos/tm_10001/2310475988/",
 "caption": "Whitney Houston performing on her My Love is Your Love Tour in Hamburg.",
 "credit": "flickr/<a href='http://www.flickr.com/photos/tm_10001/'>tm_10001</a>"
 },
 "text": {
 "headline": "Whitney Houston<br/> 1963 - 2012",
 "text": "<p>Houston's voice caught the imagination of the world propelling her to superstardom at an early age becoming one of the most awarded performers of our time. This is a look into the amazing heights she achieved and her personal struggles with substance abuse and a tumultuous marriage.</p>"
 }
 },
 "events": [
 {
 "media": {
 "url": "{{ static_url }}/img/examples/houston/family.jpg",
 "caption": "Houston's mother and Gospel singer, Cissy Houston (left) and cousin Dionne Warwick.",
 "credit": "Cissy Houston photo:<a href='http://www.flickr.com/photos/11447043@N00/418180903/'>Tom Marcello</a><br/><a href='http://commons.wikimedia.org/wiki/File%3ADionne_Warwick_television_special_1969.JPG'>Dionne Warwick: CBS Television via Wikimedia Commons</a>"
 },
 "start_date": {
 "month": "8",
 "day": "9",
 "year": "1963"
 },
 "text": {
 "headline": "A Musical Heritage",
 "text": "<p>Born in New Jersey on August 9th, 1963, Houston grew up surrounded by the music business. Her mother is gospel singer Cissy Houston and her cousins are Dee Dee and Dionne Warwick.</p>"
 }
 },
 {
 "media": {
 "url": "https://youtu.be/fSrO91XO1Ck",
 "caption": "",
 "credit": "<a href=\"http://unidiscmusic.com\">Unidisc Music</a>"
 },
 "start_date": {
 "year": "1978"
 },
 "text": {
 "headline": "First Recording",
 "text": "At the age of 15 Houston was featured on Michael Zager's song, Life's a Party."
 }
 },
 {
 "media": {
 "url": "https://youtu.be/_gvJCCZzmro",
 "caption": "A young poised Whitney Houston in an interview with EbonyJet.",
 "credit": "EbonyJet"
 },
 "start_date": {
 "year": "1978"
 },
 "text": {
 "headline": "The Early Years",
 "text": "As a teen Houston's credits include background vocals for Jermaine Jackson, Lou Rawls and the Neville Brothers. She also sang on Chaka Khan's, 'I'm Every Woman,' a song which she later remade for the <i>Bodyguard</i> soundtrack which is the biggest selling soundtrack of all time. It sold over 42 million copies worldwide."
 }
 },
 {
 "media": {
 "url": "https://youtu.be/H7_sqdkaAfo",
 "caption": "I'm Every Women as performed by Whitney Houston.",
 "credit": "Arista Records"
 },
 "start_date": {
 "year": "1978"
 },
 "text": {
 "headline": "Early Album Credits",
 "text": "As a teen Houston's credits include background vocals for Jermaine Jackson, Lou Rawls and the Neville Brothers. She also sang on Chaka Khan's, 'I'm Every Woman,' a song which she later remade for the <i>Bodyguard</i> soundtrack which is the biggest selling soundtrack of all time. It sold over 42 million copies worldwide."
 }
 },
 {
 "media": {
 "url": "https://youtu.be/A4jGzNm2yPI",
 "caption": "Whitney Houston and Clive Davis discussing her discovery and her eponymous first album.",
 "credit": "Sony Music Entertainment"
 },
 "start_date": {
 "year": "1983"
 },
 "text": {
 "headline": "Signed",
 "text": "Houston is signed to Arista Records after exec Clive Davis sees her performing on stage with her mother in New York."
 }
 },
 {
 "media": {
 "url": "https://youtu.be/m3-hY-hlhBg",
 "caption": "The 'How Will I Know' video showcases the youthful energy that boosted Houston to stardom.",
 "credit": "Arista Records Inc."
 },
 "start_date": {
 "year": "1985"
 },
 "text": {
 "headline": "Debut",
 "text": "Whitney Houston's self titled first release sold over 12 million copies in the U.S. and included the hit singles 'How Will I Know,' 'You Give Good Love,' 'Saving All My Love For You' and 'Greatest Love of All.'"
 }
 },
 {
 "media": {
 "url": "https://youtu.be/v0XuiMX1XHg",
 "caption": "Dionne Warwick gleefully announces cousin, Whitney Houston, the winner of the Best Female Pop Vocal Performance for the song Saving All My Love.",
 "credit": "<a href='http://grammy.org'>The Recording Academy</a>"
 },
 "start_date": {
 "year": "1986"
 },
 "text": {
 "headline": "'The Grammys'",
 "text": "In 1986 Houston won her first Grammy for the song Saving All My Love. In total she has won six Grammys, the last of which she won in 1999 for It's Not Right But It's Okay."
 }
 },
 {
 "media": {
 "url": "https://youtu.be/eH3giaIzONA",
 "caption": "I Wanna Dance With Somebody",
 "credit": "Arista Records Inc."
 },
 "start_date": {
 "year": "1987"
 },
 "text": {
 "headline": "'Whitney'",
 "text": "Multiplatinum second album sells more than 20 million copies worldwide. With 'Whitney', Houston became the first female artist to produce four number 1 singles on one album including \"I Wanna Dance With Somebody,' 'Didn't We Almost Have It All,' 'So Emotional' and 'Where Do Broken Hearts Go.'"
 }
 },
 {
 "media": {
 "url": "https://youtu.be/96aAx0kxVSA",
 "caption": "\"One Moment In Time\" - Theme song to the 1988 Seoul Olympics",
 "credit": "Arista Records Inc."
 },
 "start_date": {
 "year": "1988"
 },
 "text": {
 "headline": "\"One Moment In Time\"",
 "text": "The artist's fame continues to skyrocket as she records the theme song for the Seoul Olympics, 'One Moment In Time.'"
 }
 },
 {
 "media": {
 "url": "",
 "caption": "",
 "credit": ""
 },
 "start_date": {
 "year": "1989"
 },
 "text": {
 "headline": "Bobby Brown",
 "text": "Houston and Brown first meet at the Soul Train Music Awards. In an interview with Rolling Stone Magazine, Houston admitted that it was not love at first sight. She turned down Brown's first marriage proposal but eventually fell in love with him."
 }
 },
 {
 "media": {
 "url": "https://youtu.be/5Fa09teeaqs",
 "caption": "CNN looks back at Houston's iconic performance of the national anthem at Superbowl XXV.",
 "credit": "CNN"
 },
 "start_date": {
 "year": "1991"
 },
 "text": {
 "headline": "Super Bowl",
 "text": "Houston's national anthem performance captures the hearts and minds of Americans ralllying behind soldiers in the Persian Guf War."
 }
 },
 {
 "media": {
 "url": "https://youtu.be/h9rCobRl-ng",
 "caption": "\"Run To You\" from the 1992 \"Bodyguard\" soundtrack..",
 "credit": "Arista Records"
 },
 "start_date": {
 "year": "1992"
 },
 "text": {
 "headline": "\"The Bodyguard\"",
 "text": "Houston starred opposite Kevin Costner in the box office hit, The Bodyguard. The soundtrack to the movie sold over 44 million copies worldwide garnering 3 Grammy's for the artist."
 }
 },
 {
 "media": {
 "url": "https://youtu.be/5cDLZqe735k",
 "caption": "Bobby Brown performing \"My Prerogrative,\" from his \"Don't be Cruel\" solo album. Bobby Brown first became famous with the R&B group, New Edition.",
 "credit": ""
 },
 "start_date": {
 "year": "1992"
 },
 "text": {
 "headline": "Married Life",
 "text": "<p>After three years of courtship, Houston marries New Edition singer Bobby Brown. Their only child Bobbi Kristina Brown was born in 1993.</p><p>In 2003 Brown was charged with domestic violence after police responded to a domestic violence call. Houston and Brown were featured in the reality show, \"Being bobby Brown,\" and divorced in 2007.</p>"
 }
 },
 {
 "media": {
 "url": "//upload.wikimedia.org/wikipedia/commons/d/dd/ABC_-_Good_Morning_America_-_Diane_Sawyer.jpg",
 "caption": "Diane Sawyer ",
 "credit": "flickr/<a href='http://www.flickr.com/photos/23843757@N00/194521206/'>Amanda Benham</a>"
 },
 "start_date": {
 "year": "2002"
 },
 "text": {
 "headline": "Crack is Whack",
 "text": "<p>Houston first publicly admitted to drug use in an interview with Diane Sawyer. The singer coined the term \"Crack is Whack,\" saying that she only used more expensive drugs.</p>"
 }
 },
 {
 "media": {
 "url": "https://youtu.be/KLk6mt8FMR0",
 "caption": "Addiction expert, Dr. Drew, talks about Whitney's death and her struggle with addiction.",
 "credit": "CNN"
 },
 "start_date": {
 "year": "2004"
 },
 "text": {
 "headline": "Rehab",
 "text": "<p>Houston entered rehab several times beginning in 2004. She declared herself drug free in an interview with Oprah Winfrey in 2009 but returned to rehab in 2011.</p>"
 }
 },
 {
 "media": {
 "url": "",
 "caption": "",
 "credit": ""
 },
 "start_date": {
 "year": "2005"
 },
 "text": {
 "headline": "Being Bobby Brown",
 "text": "<p>Being Bobby Brown was a reality show starring Brown and wife Whitney Houston. Houston refused to sign for a second season. A clip of her telling Brown to \"Kiss my ass,\" became a running gag on The Soup.</p>"
 }
 },
 {
 "media": {
 "url": "",
 "caption": "",
 "credit": ""
 },
 "start_date": {
 "year": "2010"
 },
 "text": {
 "headline": "A Rocky Comeback",
 "text": "<p>Houston's comeback tour is cut short due to a diminished voice damaged by years of smoking. She was reportedly devastated at her inability to perform like her old self.</p>"
 }
 },
 {
 "media": {
 "url": "//twitter.com/Blavity/status/851872780949889024",
 "caption": "Houston, performing on Good Morning America in 2009.",
 "credit": "<a href='http://commons.wikimedia.org/wiki/File%3AFlickr_Whitney_Houston_performing_on_GMA_2009_4.jpg'>Asterio Tecson</a> via Wikimedia"
 },
 "start_date": {
 "month": "2",
 "day": "11",
 "year": "2012"
 },
 "text": {
 "headline": "Whitney Houston<br/> 1963-2012",
 "text": "<p>Houston, 48, was discovered dead at the Beverly Hilton Hotel on on Feb. 11, 2012. She is survived by her daughter, Bobbi Kristina Brown, and mother, Cissy Houston.</p>"
 }
 }
 ]
}

Kim Jong Un Tours Pesticide Facility Capable of Producing Biological Weapons

October 4, 2017December 12, 2017 tianyi

Paper:

Hanham, M. (2015, July 9). Kim Jong Un Tours Pesticide Facility Capable of Producing Biological Weapons: A 38 North Special Report

Discussion leader: Tianyi Li

“You can’t trade your freedom for security, because if you do you’re going to lose both. “

—— Brandon Mayfield

Summary:

This article is a report by 38 North, which is a program of the U.S.-Korea Institute at SAIS dedicated to providing the best possible analysis of events in and around North Korea. it investigate capability of North Korea to build biological weapons (BW) in large scale for military use.

It is a common practice to cover BW development programs with civilian pesticide facilities, despite of any assertion. Previous example include Iraq’s Al Hakam Factory who produced both Bacillus anthraces and Bacillus thuringiensis, and Soviet Union’s Progress Scientific and Production Association, who produce bio-fertilizers in peacetime but BW for war. North Korea’s efforts of BW program is revealed since 2015, when a defector fled the country carrying human testing data. From North Korea’s media report where Kim Jong Un toured a facility ostensibly for producing pesticides, it is estimated that the same facilities are able to produce military sized batches of BW, especially anthrax. The author explained how Anthrax, one type of BW, is related to and can be covered by commonly used pesticides. Then he listed evidence from images that shows the modern equipment North Korea has.

The author then continued to explain how North Korea develops this dual-use capability. Not only do the devices maintains the ability to produce BW, but the action North Korea took to illicitly import relevant materials is making them highly suspicious. North Korea is under International treaties, regimes and national laws that prevents BW, however, much of the equipment seen in the Pyongyang Bio-technical Institute violates export control laws. In addition to import from China, some open-source research reveals that the Swiss branch of an international nongovernmental organization provided training. known as intangible technology transfer (ITT), and basic equipment to the North that may have inadvertently contributed to North Korea’s ability to produce BW.

The motivation of North Korea to develop BW dates back to the Korean War, when it is accused that Americans were conducting BW test on Koreans. This accusation is reinforced when news broke that the American military had mistakenlyshipped live-anthrax to labs in nine US states as well as to the Osan Air Base in South Korea. The tour of Kim Jong Un is believed to be a veiled threat to the US and South Korea.

Reflections:

This article listed facts and analyzed the hidden possibility enabled by those facts, of North Korea’s capability of developing BW. It supports its argument by quoting previous similar examples and motivations for North Korea to have the intention. As is explained in the report, biological weapons facilities are notoriously difficult to identify and monitor due to their dual-use nature, and they can operate in each capacity. With the history and plausible intention of North Korea’s interest in BW, the facilities they have is viewed at least a future threat. The report did not elaborate on the human rights perspective, instead, it stated facts and possible connections that link the fact to the hypotheses. This makes it objective, concrete and convincing.

Biological weapons in their current form are inherently indiscriminate weapons. It is almost inconceivable that they could be directed at a specific military objective.Biological weapons are perhaps the only weapons that cannot in any way or form be directed only at military objectives. A disease will not distinguish between civilians and combatants. A Japanese biological weapons attack on the Chinese city of Changde in 1941 resulted in the death of around 10,000 people. About 1700 Japan’s own troops were also among the casualties.

With the lack of discussion on the human rights perspective, I searched for literatures with that kind of discussion. I would recommend Weapons of mass destruction and human rights, by Peter WEISS and John BURROUGHS: https://www.peacepalacelibrary.nl/ebooks/files/UNIDIR_pdf-art2139.pdf. They pointed out that “With few exceptions those who think, write and speak about Weapons of Mass Destruction (WMD) live in a different world from those who think, write and speak about human rights.” The experts in those two fields think about different problems. More importantly, the characteristics of WMD, and of nuclear weapons in particular, provide both the magnitude and the condensed launch time that expand the concept of self-defence from a reaction to actual or imminent aggression to a preventive strike against aggression that may occur at any time in the future, be it weeks, months or years from now

Questions:
* Do you think the right to peace, along with the right to life, should be considered as Human Rights? How do you think WMD or BW in particular influence such human rights?
* Do you think it is still important to insist on respect for the human person and elementary considerations of humanity—on fundamental human rights—even during the chaos and intentional violence of war?
* How do you think Weapons of Mass Destruction (WMD) can be abolished or at least reduce the risk of their being used? How to prevent their proliferation, what damage they cause to humans and other living things?

4chan and /b/: An Analysis of Anonymity and Ephemerality in a Large Online Community

September 24, 2017December 12, 2017 tianyi

Paper:

Bernstein, M. S., Monroy-Hernandez, A., Harry, D., André, P., & Panovich, K. (2011). 4chan and /b/: An Analysis of Anonymity and Ephemerality in a Large Online Community (Links to an external site.)Links to an external site.. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media.

Discussion leader: Tianyi Li

Summary:

This article explores the concepts of ephemerality and anonymity the-the, using the first and most popular board “/b/” on the imageboard website “4chan” as a lens. To better understand how the design choices we make impact the kinds of social spaces that develop, the authors perform a content analysis and two data-driven studies of /b/. Perhaps best known for its role in driving Internet culture and its involvement with the “Anonymous” group, the authors believe 4chan’s design plays a large role in its success, despite its counter-intuitiveness. In this paper, the authors quantify 4chan’s ephemerality (there are no archives; most posts are deleted in a matter of minutes) and anonymity (there are no traditional user accounts, and most posts are fully anonymous) and discuss how the community adapts to these unusual design strategies.

The authors review the prior literature on anonymity and ephemerality. First, they review communities that choose different points on the spectrum of anonymity — from completely unattributed to real names (eg. Facebook). There is previous research on online communities that use pseudonymity to build user reputation, and anonymity in small groups. The authors reconsider their results in larger online communities in their work. The authors acknowledged the mixed impact of anonymity on online-community. On the one hand, identity-based reputation systems are important in promoting pro-social behavior, and removing traditional social cues can make communication impersonal and cold, as well as undermining credibility. On the other hand, anonymity may foster stronger communal identity as opposed to bond-based attachment with inpiduals, impact participation in classrooms and email lists, and produce more ideas and overall cohesion within the groups. Second, they recognize the rarity of ephemerality on large-scale online communities and claims to be the first to study it directly in situ. Although data permanence has been the norm of online communities, it has downsides in some situations, as the example given by the authors, archiving history in chat rooms has elicited strong negative reactions. They also relate the previous related academic work to practical implications for online social environments.

4chan is composed of themed boards, each having threads of posts. The author justified the choice of /b/, the “random” board that is 4chan’s first and most active board, where “rowdiness and lawlessness” happen, and the “life force of the website”. After explaining the background about this forum and board, the authors described and discussed their methods and results of their two studies.

The first study focuses on ephemerality. The ephemerality on 4chan is enforced by thread expiration and real-time ranking and removal of threads by their replies. The authors characterized /b/’s content by the communal language used, and conducted a grounded analysis on a series of informal samples of thread-starting posts through an eight-month participant observation on the site. The authors focus collected a dataset of activity on /b/ for two weeks and conducted a content analysis of 5,576,096 posts in482,559 threads. The authors believed that the sample size is representative enough of most daily and weekly cycles. They did not capture images due to nature of the materials. They capture the daily activity in the two-week dataset by calculating the number of threads per hour, the threads lifetime in seconds, and the amount of time (in seconds) the thread stay on the first page. The amount of posting activity in one forum board is roughly the same as arenas like Usenet and Youtube. They identified the high traffic time on the website when both the lifetime and first-page exposure of threads is the lowest due to high competition. The contents deletion plays role in pushing the community quickly iterate and generate popular memes. Users can control ephemerality by bumping up threads through replies and burying it through “sage”. Such efforts raise community participation unintuitively. They also found that the users have developed mechanisms to keep valuable contents: they preserve images on their local machine, and they donate images in return for their requests.

The second study focuses on anonymity. The anonymity on 4chan plays out by not requiring an account to post and not enforcing unique pseudonym. Despite the existence of “tripcodes” for password holders, they found that this feature, as well as the pseudonyms, are largely eschewed. The authors use the two-week data sample to analyze the identity metadata of each post. They found that only 10% use pseudonyms and less than 2% posts with email, 40% of which are not even actual emails and mainly for the “sage” feature. Tripcodes are used only for users to privately keep their authorship of the the previous post. The authors found that anonymity can be feature on the dynamics of 4chan, despite usual disbelief. It provides a cover for more intimate and open conversations for advice and discussion threads; encourages experimentation with new ideas or memes by masking the sense of failure and softening the blow of being ignored of explicitly chastised. In addition, the community is able to recognize authenticity via timestamp. Furthermore, instead of building an inpidual reputation, anonymity in /b/ gives rise to community boundaries with textual, linguistic and visual cues.

Reflections:

This article had some nice strengths. It is the first to study ephemerality the in a large-scale online community directly and in situ. It provides a nice overview of an extreme of the opposite of commonly accepted on-line community norm. As the authors note, the opposite positions on user identity and data permanence have its own merits and advantages. The authors provided an in-depth literature review of the dominating belief as well as a comprehensive analysis of a representative sample of the opposite extreme.

Not aware of the existence of such communities, I was intrigued to read the paper but also got my mind blown when I try to see what 4chan looks like. The first impression I got of the /b/ board is that “this is the dark and dirty side of the cyber world”. However, after finish reading the paper and some related discussion mentioned in related literature, I appreciated the author’s professionalism and sharp insights into the online ecosystem. Also, checking other boards reshaped my impression and made me realize 4chan is a powerful platform where people care more about the truth itself than judging if things are true by who is telling them. I also learned the real-world impact of 4chan, both in US election, and the CNN blackmail scandal.

The results from the two study are interesting. The most impressive one is that the effect of ephemerality on content quality echoes my personal experience. As quoted by the authors from previous research, “social forgetfulness” has been playing an important role in human civilization. This reminds me of the saying “there’s nothing new under the sun”. The richness of information is never as valuable as the limited attention and human memory. Although I applaud the concept of ephemerality but stay suspicious with anonymity. To be honest, it is challenging for me to stay unbiased with such online community with a high degree of autonomy through anonymity. I see the value of a certain level of anonymity given the authors’ study results and discussion, but I still doubt if the good outweighs the bad. Unlike ephemerality, which leads to a competition for attention by producing high-quality and eye-catching contents, anonymity removes the burden of responsibility from the posters of the impact their posts have in the community.

I admired the methods the authors used to conduct their analysis. The statistical analysis and the daily activity graph is very straightforward and self-explanatory. I never used content analysis myself before. After researching more details, I feel like that part where the authors conducted grounded analysis using a series of informal samples of thread-starting posts on /b/ is closer to the descriptions I read about content analysis. For the two-week long data set, they mainly did a quantitive analysis of the post metadata, including the post timestamps, reply timestamps, usernames and user emails.

Last but not least, despite that I was uncomfortable with some posts on the website, I wonder if the decision of not capturing images in the posts changes the analysis fundamentally. Since intuitively, those are highly possible to be the real “life force” of the website and keep reoccurring in my limited times of visiting the website. I would appreciate it if the authors have captured that at least the metadata of part of posts, and analyzed the weight and impact of inappropriate contents on the overall website.

Questions:

* What do you think of the advantages and disadvantages of anonymity and ephemerality discussed in the paper? Do you have additional perspectives?

* How do you think such online communities as 4chan impact the overall cyber ecosystem, and real world?

* Do you trust the anonymity in online community?

* Did you know about 4chan before? What did you think of it? Does this paper influence your point of view and how?

* Where in the user identity spectrum do you think works best? What are the situations or contexts?

Check: Collaborative Fact-checking

September 20, 2017December 12, 2017 tianyi

Technology: Check:Verify breaking news online checkmedia.org

Demo leader: Tianyi Li

Summary:

Check is a web-based tool on Meedan’s platform for collaborative verification of digital media. It was founded in 2011 as Checkdesk, and adopted this new name in 2016. They have worked to build online tools, support independent journalists, and develop media literacy training resources that aim to improve the investigative quality of citizen journalism and help limit the rapid spread of rumors and misinformation online. It combines smart checklists, workflow integrations, and intuitive design to support an efficient and collaborative process. It was used during Electionland, which is a collaborative project held during the US elections to look at and report on voting access across the country on Election Day and the days leading up to it.

People can post media links in their project on Check, and invite others to investigate and verify the contents. Check provides a web interface for people to add annotation notes, set verification status, add tags (not working) and add different types of tasks for each link. To investigate in Check, you should first set up a new account and create a team. You can create multiple teams and join other people’s team. In each team, you can set up projects for specific investigations for your team. Each project allows you add items, like social media posts or web sites, that you are investigating. There are four different roles in Check, team owner, team editor, journalist and contributor. Different level of access and permissions are granted to each role. Details on user role here.

Check is a open-source project and offers its API on Github. The project uses Ruby on Rails (or simply Rails, is a server-side web application framework written in Ruby under the MIT License.) They both Docker (software container platform) based or non-Docker-based installation for you to deploy the project on your local machine. Other applications can communicate with this service (and test this communication) using the client library, which can be automatically generated. People can also use functions exposed by this application in the client library.

Limitation: They now only support Chrome.

Demo:

Create a new account
- Visit https://checkmedia.org/ on Google Chrome only
- Set up a new account. You can:
  - Authorize your account with an existing social media platform (currently that’s Facebook, Twitter or Slack)
  - Set up a new account with your email address
Create a team
- Type in a Team Name.
- Type in a Team URL.
Join a team: https://checkmedia.org/investigative-tech/join
Create a new project
- From your project page, click on “Add Project”
- Start typing the name of the new project. (Don’t worry, you can change this later)
- Hit Enter/Return
Add a link for Investigation
- Click on the project name on the left. This opens up your project workspace.
- Click on the bottom line, where it says “Paste a Twitter, Instagram, Facebook or YouTube link”
- Here, you can drop in a link from any of these social networks (soon, you’ll be able to add any link!)
- Click “Post”
- This will create a page for investigation of the link.
Annotating a link
- Add a note:
  - In the bar at the bottom, type a note. For instance, type “I am looking into the exact location for this Tweet.”
  - Click Submit
  - This will add your note.
  - Others in your team can also add notes as they collaborate on the investigation.
- Set verification status:
  - In the upper left hand corner of the link, click on the blue “Undetermined” dropdown.
  - Choose a status
  - This sets the status and adds a note to the log

- Add a tag:
  - Add the bottom of your media, click on the “…” and choose edit
  - (I don’t think this function works…)
Add a task to the link under investigation:
- Go to the media page and click “Add task” link.
- Choose a type from the list