An Update on NLTK — Web Based, GUI NLTK via WebNLP!

Tool: WebNLP at http://dh.mi.ur.de/

My demonstration tomorrow is essentially a re-do of my earlier NLTK demo. For this round, I’ve been looking at the tool WebNLP. Unfortunately, the website on which it is hosted hasn’t been loading all evening. The white paper that accompanies this tool is very interesting, and reflects some of my experiences trying to work with NLTK as a humanities/social sciences researcher. Hopefully the site will be up again soon, but for everyone’s edification, I thought that a blog post about the WebNLP white paper would be productive.

Here’s the paper: https://www.researchgate.net/publication/266394311_WebNLP_-_An_Integrated_Web-Interface_for_Python_NLTK_and_Voyant

This includes a description of WebNLP’s functionalities, but is also a rationale for the development of a web-based GUI NLTK program. The authors write:

 Most of these [NLP] tools can be characterized as having a fairly high entry barrier, confronting non-linguists or non-computer scientists with a steep learning curve, due to the fact that available tools are far from offering a smooth user experience (UX)…

The goal of this work [the development of WebNLP] is to provide an easy-to-use interface for the import and processing of natural language data that, at the same time, allows the user to visualize the results in different ways. We suggest that NLP and data analysis should be combined in a single interface, as this enables the user to experiment with different NLP parameters while being able to preview the outcome directly in the visualization component of the tool. (235-236)

They go on to describe how WebNLP works, visualized in this graphic:

Visualization of WebNLP functionality

As we can see, WebNLP joins Python NLTK with the program Voyant to create a user-friendly (i.e. no coding or command line interface requirements) tool for NLP that is sophisticated enough for scholarly research. The fact that it’s web-based seems to be a benefit, too; I’d imagine that a local application would require the user to install Python, which could be problematic.

WebNLP is based on JavaScript and the front-end framework Bootstrap. I don’t know if it’s open source — I couldn’t find it on Github and the paper doesn’t mention that. As far as I can tell, the only place it is hosted is at the link shared above. It doesn’t seem extremely difficult to implement, and given its potential usefulness, I (of course!) think that it should be hosted at a stable site — or that the code should be opened up to allow other to host WebNLP applications and iterate on it. Right now, it is limited to sentence tokenization, part-of-speech tagging, stop-word filter, and lemmatization. I think there is more that NLTK can do. The visualization output, meanwhile, can produce the following: Wordclouds, bubblelines, type frequency lists, scatter plots, relationships and type frequency charts. Even just as an experiment or learning tool, it might be useful to think about how else these data might be visualized.

All this being said, I really do hope http://dh.mi.ur.de/ returns soon. In any case, it’s encouraging to see that something I thought should already exist… already does. And the paper has been useful for my semester-long project.

 

 

Read More

Family Matters: Control and Conflict in Online Family History Production

Paper: Willever-Farr, H. L., & Forte, A. (2014). Family Matters: Control and Conflict in Online Family History Production. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing (pp. 475–486). New York, NY, USA: ACM.

Discussion Leader: Sukrit V

Summary:

This paper explores two family history research (FHR) communities: findagrave.com and ancestry.com. FHR communities are uniquely distinct from online encyclopedia communities such as Wikipedia in that they exist to meet an individual’s or family’s needs and wants, taking into account identity building and memorialization.

This work aims to answer three questions, namely:

  1. What tensions and conflicts arise as FHRs engage both individuals/their families and the public in the construction of historical information.
  2. How are these tensions and conflicts negotiated?
  3. How do these FHR communities constrain and support negotiation of these tensions?

In order to answer these questions, the authors performed a qualitative study on message boards, forum data and interviews with FHRs.

They noted that Ancestry’s policies forced most users to make their familial information private, while Find a Grave’s content moderation policies and lack of recommendations allowed for higher quality research-backed content.

Their interviewees spoke about their personal and public-oriented motivations for contributing to these communities as well as factors that detracted from their participation in these communities.

The authors conclude with a discussion of policy and design considerations.

Reflection:

It was interesting to see how Ancestry.com, which was a paid service, had lower quality contributions (thanks to the “clickologists”) than Findagrave.com. The authors attributed this to the way Ancestry is marketed and its automated suggestion tools.

In addition, all the contributors to the FHR communities fiercely defended their family tree’s “turf”, yet viewed the sites as a historical resource for the public. The members even went to other graveyards to collect and share information about non-related individuals because others may need that information.

Ancestry’s problems with inaccurate content, bogus “poison-ivy” suggestions and content curation policies led to all but one of the interviewees to make their family trees private – thus reducing the viability of Ancestry as a public information source. Clearly, this is indicative of how policies on any ‘social networking’-esque site can ruin or boost its reputation among users. Ancestry even has arrangements with archival repositories in the US and Europe to make records available to subscribers, and yet, has lower quality content that Find a Grave.

The authors note that inexperienced users on FHR communities require instruction. Their design consideration of including “learning spaces” within these communities could prove particularly useful in increasing content quality.

One benefit of the familial oversight of content in these FHR communities, however, is that it facilitated a sense of ownership and thus allowed for certain individuals to have ‘domain’ over certain parts of the knowledge base. Yet, this same oversight also hindered others from contributing (possibly) accurate information – due to the volunteers of Find a Grave becoming overwhelmed. Perhaps allowing for Find a Grave community members to be elevated to the post of moderators would alleviate this issue.

It is interesting to note that certain members of these FHR communities competed with one another to boost the number of memorials on the website. The authors do not discuss why contributors would want to do so, however. Perhaps each person’s profile has a number of how many they have contributed to – and thus, gamification is something that should be avoided.

Interviewing the volunteers and the owner of Find a Grave, and even Ancestry would perhaps provide more fruitful insights into these communities. Perhaps analyzing the requirements and needs of Find a Grave’s volunteers would indicate that they did not need help or did not believe that the community members could serve as moderators.

Further, FHR sites should include a standard set of guidelines when contributing that includes what information should be provided, how it should be written and also ask for specific sources or proof.

Questions:

  1. Why do you think Ancestry.com, a paid service (almost $189-$299 a year), had more “clickologists” or “fake researchers” than Findagrave.com, a free-to-use, freely accessible service?
  2. Clearly, Find a Grave needs more volunteers. Do you think existing community members should be allowed to become moderators? Why or why not?
  3. Would you ever contribute to either of these two communities? If yes, would you make your family tree public or private?
  4. How would you improve Ancestry’s ‘hint’ feature? (Perhaps suggestions from humans as opposed to automated suggestions would improve the quality of connections.)
  5. Ancestry now includes DNA analysis to “reveal your ethnic mix and ancestors you never knew you had”. Do you think this would improve user trust in the website?

Read More

Can History Be Open Source?

Paper: Rosenzweig, R. (2006). Can History Be Open Source? Wikipedia and the Future of the Past (Links to an external site.)Links to an external site.Journal of American History, 93(1), 117–146.

Discussion leader: Md Momen Bhuiyan:

Summary:
This article summarizes the history of Wikipedia along with its importance as a source for historical reference. The author first points out how Wikipedia is different than the traditional work which largely consists of singly authored work. On the other hand Wikipedia articles are written by the general public with very few restrictions. Wikipedia also differs from traditional work as it is completely opensource with the only restriction that the part that is copied can’t be imposed any more restrictions. This article four topics about Wikipedia: the history of its development, how it works, how good is the historical writing, and what are the potential implications for the professional community.

Wikipedia was founded by Jimmy Wales and Larry Sanger in January 2001 as an open encyclopedia. They built another encyclopedia moderated by experts in March 2000 named Nupedia which had little success. They started Wikipedia as a new approach as well as in hope that the contributors there will also contribute at Nupedia. Quickly the number of articles in Wikipedia grew. But Sanger didn’t see this success as he left due to his concern about tolerance of trolls, which the author characterizes as ‘difficult people’.

Initially, Wikipedia started with no rules. Over the time it had to set some rules to minimize difficult outcomes. Now Wikipedia has a large set of rules. But those rules can be summarized in four key policies. The first policy is its goal as an encyclopedia and nothing beyond that. So it excludes work that is personal or critical or original research. This goal is coherent with what can be accomplished by a large group. But it also puts the same weight on the work of an expert and a non-expert which lead to the departure of Sanger from the organization. The second point is that articles should be written from a neutral point of view (NPOV). This point describes Wikipedia’s stance as a third party who doesn’t take any side. But it is not always achieved for a topic even after huge discussion. The third policy is that “don’t infringe copyrights”. It also comes with the licensing term for the Wikipedia content known as GNU Free Documentation License(GFDL). Some scholars argued that the imperfect resource that is “free” to be used in any way can be more valuable than a gated resource that is better in quality. The final policy is that “respect other contributors”. Initially, wikipedia got by having a minimal set of rules. But gradually Wikipedia added rules for banning to work with difficult people. It also set structure for the administration. Considering the growth of Wikipedia, all of these has worked quite well.

The history articles in Wikipedia have various nuances. From historians point of view, articles could be viewed as incomplete and inaccurate with bland prose style. Articles also have structural issue as well as inconsistent attention to details. The author thought that the part of the problem was that people write only about things that are interesting to them. To compare the contributions in popular articles in Wikipedia with other encyclopedias, the author analyzed 25 articles related to biography from Wikipedia with Encarta and American National Biography Online. Overall, Wikipedia lags behind American National Biography Online but is comparable to Encarta. It was surprising that Wikipedia had people write large documents with reliable information. Another thing the author note is that “geek culture” has shaped the articles in Wikipedia. So there are many articles about games or science but there are not many about art, history or literature. The author found only 4 errors in 25 article with minor detail issue. One problem with people is that their writing style varies which is reflected in Wikipedia articles. Due to the NPOV policy it is hard to find any specific stance in Wikipedia. Generally the bias in an article favors the subject. At the same time collective contribution avoid controversial stand of all kinds. Vandilsm in Wikipedia articles can be erased quite easily and quickly compared to other sites. Still some of the controversy for vandalism lead Wikipedia to impose rule for registering before editing an article.

Due to the open access students are using Wikipedia as an information source regularly. Search result from wikipedia comes up at top in most search engines. Due to the large volume of the content in Wikipedia it is bound to have wrong information. To solve this problem teachers can teach thier student not to rely heavily on Wikipedia sources. Another solution is to emulate Wikipedia like democracy in content sharing and provide free resources from high quality sources. Wikipedia has many rules that are very conventional like academic lessons. So it is easier to fit there as an academic. This leads to the solution that more historians should contribute there. But they still have to worry about dealing with the original research issue and collaborating with difficult people. General problem with history in wikipedia is that it popular history rather than professional. Finally the author points to the law of large number. People in group can be as effective as an expert. So it is applicable in creating collaborative history books.

Reflection:
This article gave a good brief for Wikipedia from historical point of view. Wikipedia doesn’t seem to be attractive for professional contribution. But It can always be used as initial reference point. Authors suggestion about creating history by collaborative work seemed interesting. I haven’t heard about any such effort yet. While there is merit in such effort, at the same time it disregards authors point of view which might useful for some reader. It will be interesting to see how the policies changed after 2006. Author’s use of biography for comparision was ineteresting. But I would have wanted judgement from several people for those comparison. Finally, there has been no changes on IP rights on any business model given the amount of free resources has increased. So there might be some use to have both free and commercial resources.

Questions:
1. How would you design a collaborative work on history for any particular topic?
2. Is it possible to design micro task for this type of work? How do you apply law of large number in those tasks?
3. How do you make history interesting?
4. Do you think misinformation is Wikipedia has any real repurcussion for students?
5. Do you think giving extra privilege to expert could be useful?

Read More

Montage: Collaborative video tagging

URL: https://montage.storyful.com

Demo Leader: Md Momen Bhuiyan

Summary:
Montage is a collaborative site for publicly available video tagging. The homepage has login feature using google account and a GitHub link to the source code for the website. So anyone can host similar sites. After logging in users are shown an interface with existing projects. They can also create new projects. In each project, users can add publicly available youtube videos in their collection either by searching or by youtube URL. Users can add as many videos to their as they want. The search option has filtering feature with date and location. The project can also be filtered by keyword, date, location etc. A user can add another user to a project by inviting them. In each video, users can add a comment at any time in the video. They can also tag any segment in the video. Users can set the location of a video. A video can be starred, marked as duplicate, archived, exported as CSV/KML/youtube playlist etc. There is a tab for providing updates on the project among the collaborators.

Reflection:
The site has many features like searching, adding, and filtering video as well as exporting them. But it also lacks in many things. It allows only logging in using google account. It doesn’t allow users to chat directly during collaboration which is an essential feature for a collaborative task. Project updates allow only text message rather than reference to any modification. There is no modification history to see. Also, users have to go to “My Projects” to log out. It was interesting that one can export the video location as a KML file. In overall sense, this is a good project that can be extended for other purposes as the code is opensource.

How to:
1. First go to https://montage.storyful.com and log in.
2. Initially users will be shown a page with list of projects.
3. There is a button for creating new project.
4. Users can add a title, description and an image for a project.
5. After clicking on a project user is shown a project interface.
6. The top menu here has option for searching video, inviting users, seeing project updates etc. The side drawer has some other options like all videos, favorites, unwatched, settings etc.
7. To add a video to the project click the search button.
8. Here you can search videos in youtube and add them to the project.
9. Now to tag/comment on a video, click on it in the project interface. The browser will go to the video interface.
10. Below the video here, there are buttons called “Comment” or “Add tag”.
11. There is a slider to select the time where the comment/tag to be added.
12. To add update on the project click the “Project updates” button in the top menu.
13. To add a collaborator for the project click the “Invite a collaborator” button on the top menu. It will show a popup for the name or email address of the user.
14. Finally users can export details about a project by marking videos in a project. A drawer will emerge from below with the options for “Export to”. When clicked it will show an option for the format of the file. E.g. KML, CSV etc.

Read More

Tweeting is Believing? Understanding Microblog Credibility Perceptions

Paper: Morris, M. R., Counts, S., Roseway, A., Hoff, A., & Schwarz, J. (2012). Tweeting is Believing?: Understanding Microblog Credibility Perceptions. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work (pp. 441–450). New York, NY, USA: ACM.

Discussion Leader: Sukrit V

Summary:

Twitter has increasingly become a source of breaking news and thus assessing the credibility of these news tweets is important. Generally, users access tweets in one of three ways: through their news feed (from people they follow), from Twitter’s recommended trending topics and through search. When searching for tweets, users generally have less information to base their credibility judgements on as compared to content they get from followers (who they presumably know or know of).

This paper investigates features that impact users’ assessments of tweet credibility. The authors ran an initial pilot study where they observed users thinking aloud while conducting a search on Twitter. It was noted that the participants often commented on the user’s avatar, user name and the contents of the tweet. They obtained 26 such features from the pilot study, which they used to design a survey.

In the survey, respondents were asked to assess how each feature impacts credibility on a five-point Likert scale and other demographic data. They found out that tweets encountered through search elicited more concern for credibility than those from people they followed. In addition, topics such as break news, natural disasters, and politics was more concerning than celebrity gossip, restaurant reviews, and movie reviews. The features that most enhanced a tweet’s credibility perception included the author’s influence, topical expertise, and reputation. Content-related features included a URL leading to a high-quality site and the existence of other tweets conveying similar information. Conclusively, the participants were aware of features that conveyed credibility but most of them were obscured due to Twitter’s interface and at-a-glance nature.

They conducted two experiments following their findings from the survey and pilot study. The aim of both was to find out the impact of these tweet features on credibility perceptions. In the first experiment, they varied the message topic, user name, and user image and found that participant’s tweet credibility and author credibility assessments were highly correlated, and that they were generally unaware of the true truth values of the tweets i.e. which ones were false and which were true. In addition, message topic and user names affected credibility. The user names that were topical received higher ratings than traditional names and internet names. However, due to the design of the experiment, users only saw profiles with the same avatar type (default, female, male, generic, and topical) . Thus, they designed a second experiment where each person was subject two different types of images and found that default Twitter icon avatars received significantly lower ratings, while other image types did not show any significant difference.

The authors concluded with the implications of their findings. That is, for individual users those who plan to tweet only on one topic should pick topical user names, otherwise, they should select a traditional user name. Non-standard grammar and default user photos also damaged credibility. They also suggested design changes to Twitter’s interface: author credentials should be made visible at a glance and metrics on the number of retweets or number of times a link has been shared, and who is sharing those tweets, would provide users with more context for assessing credibility. Conclusively, the authors found that users’ credibility judgements relied on heuristics and were often systematically biased. This highlights the significance of presenting more information on Tweeters when they are unknown to the user.

Reflection:

Overall, this was a very well-designed study. I particularly liked how the authors approached the survey design: by first performing a pilot study to determine what to ask, asking these questions, determining what features were important from the answers to these questions, finally narrowing down to three features, and studying the effects of these features.

Their follow-up study was necessary and could have been possibly overlooked had they not hypothesized why there was no significant impact of user image type on tweet credibility and author credibility ratings.

The basis for the entire study was reliant on Twitter users using its search feature and encountering random tweets. Clearly, search is not always implemented that way: sometimes it is organized by popularity, how recent the tweet was made and possibly by how ‘distant’ (by the number of degrees of connection) the tweet is. Obviously, mimicking this in a study is difficult to do. It would be interesting if Twitter’s search algorithm ordered tweets based on how ‘distant’ each tweet’s author was from the user (based on the number of degrees of connection).

I appreciate that the authors accounted for race and age-related bias in their study when determining the different user photos to use. Further, they mention that there is no difference in credibility ratings for male and female profiles. Of special mention is Thebault-Spieker’s recent work [1] on the absence of race and gender bias in Amazon Mechanical Turk workers. The participants for this study are from an email list, however, and the question arises, why is there no gender related bias? Or if there is, how statistically significant is it?

The tweet content that they use for the study does not seem particularly ecologically valid, however, having recently joined Twitter, I do not believe I am the right person to comment on the same. For example, they used only bit.ly links in the Tweets and the text itself is not how most people on Twitter write. Their use of ecologically valid Twitter avatars and gender-neutral user names, however, is commendable.

Their finding that participants were generally unable to distinguish between true and false (both plausible) information was of particular note.

They also noted that topical user names did better than user names of the other two types. Perhaps this was because of a lack of any biographic information (as they hypothesize). An interesting follow-up study would be to test whether topical user names with biographical information or traditional user names with similar biographical information would be perceived as being more credible.

Questions:

  1. Do you think journalist or technology-savvy people would perform search and assess search results on Twitter differently from typical users? If so, how?
  2. The participants for this study are from an email list, however, and the question arises, why is there no gender related bias? Or if there is, how statistically significant is it? Perhaps it is due to the design of the experiment and not necessarily that all 200-odd participants were free from gender-related bias. [Re: Thebault-Spieker’s CSCW paper (please ask me for a copy)]
  3. Do you believe the content of their tweets were ecologically valid?
  4. Why do you think topical user names did better than traditional and internet user names?
  5. How could search results be better displayed on social networks? Would you prefer a most-recent, random, or ‘distance’-based ordering of tweets?
  6. Aside: Do you think the authors asked how often the participants used Bing Social Search (and not only Google Real Time Search) because they work at Microsoft Research?

References:

[1] Jacob Thebault-Spieker, Daniel Kluver, Maximilian Klein, Aaron Haflaker, Brent Hecht, Loren Terveen, and Joseph Konstan 2018. Simulation experiments on (the absence of) ratings bias in reputation systems. In Proceedings of the 20th acm conference on computer supported cooperative work & social computing(CSCW ’18).

 

 

Read More

CREDBANK: A Large-Scale Social Media Corpus with Associated Credibility Annotations

Article: CREDBANK: A Large-Scale Social Media Corpus with Associated Credibility Annotations: https://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/view/10582/10509

 

Summary

This is the whitepaper for Credbank, a system (which the authors refer to as a “corpus”) for systematically studying the phenomenon of social media as a news source. Credbank specifically investigates Twitter: it relies on real-time tracking and “intelligent routing” of tweets to crowdsourced human annotators to determine Tweet credibility. The trial of Credbank assessed in the paper took place over three months, and comprised “more than 60M tweets grouped into 1049 real-world events, each annotated by 30 Amazon Mechanical Turk workers for credibility (along with their rationales for choosing their annotations)” (258). Tanushee Mitra and Eric Gilbert correctly note that credibility assessment has received a great deal of attention in recent years, and the paper pays due respect to the other work done in this arena. Toward the end of their “related work” section they note that their contribution is unique in its mobilization of real-time analysis.

The bulk of the paper describes their method of collecting and analyzing Twitter data in real time, beginning with a pre-processing schema that screens tweets through tokenization, stop word and spam removal processes. This is key to their use of LDA (latent dirichlet allocation), which finds similarities between various word strings (in this case, tweets) and inductively generates topic models from them. Humans intervene in this process quickly thereafter: MTurk workers are used to confirm whether the tweets gathered actually relate to a newsworthy event; they mention that purely computational approaches often lead to false positives (261). The authors include information on their understanding of what counts as measurably “credible” (262) before disclosing that, in the process of running these trials, they also discovered the number of MTurkers necessary to approximate an expert’s judgment. That number is 30 per event (263). Through their statistical analyses of events annotated by 1,736 Turkers, they arrive at the conclusion that — basically — events discussed on Twitter have an alarmingly low rate of credibility: the highest percentage of agreement on the “certain accuracy” of tweets stood at 50% (for 95% of tweets), and the percentage of tweets / percentage of agreement-on-accuracy ratio followed the same pattern (only 55% of tweets had 80% certain accuracy agreement) (264).

The authors conclude with a macroscopic assessment on factors implicated in current and future research on this topic. This includes the temporal dynamics — it seems reoccuring events, such as sports events, had a lower overall credibility — the role of social network and mass media in impacting credibility ratings, the viability of a distribution-based (normal curve) model of credibility, other strategies used to confirm credibility, and the role that supplementary data may play.

 

Analysis

The authors ostensibly use the term “corpus” because they believe that the major contribution of Credbank is the dataset. Although the  dataset is perhaps the more obviously practical offering, their methodology — the combination of theory and practice, well-explicated in the steps they take to arrive at their data — seems to be the most instructive for those interested in advancing knowledge on crowdsourcing and social media-as-news credibility in a more general sense. To me, Credbank is not so much a dataset is an example of theory in practice. Its shortcomings have implications for the way assumptions about human expression on social media may require more careful consideration before being operationalized in systems lke the one seen here.

Their use of topic modeling/LDA seems notable to me, and a place where we can use the outcomes (evidently, tweets aren’t very credible) to tweak the theoretical assumptions. I think they may want to revisit their use of tokenization and stop words in order to account for “the nuances associated with finding a single unique credibility label for an item,” a problem that they believe impacts the viability of credibility to be modeled along a normal curve.

 

 

Questions

  1. Given our prior discussions about social media credibility as a news source, what is different about Credbank? Is there anything specific to its functionality that makes you think it more or less trustworthy?
  2. What do we think about the functions utilized in the data preprocessing (their methods of spam removal, tokenization and stop words)? Can we identify any way in which this might affect the system to deleterious effect?
  3. To return to a prior issue, since it comes up in this paper: what do we think about the use of financial incentives here? Could this taint the annotations?
  4. They frequently discuss the use of “experts” here, but do not identify who they are. Do we see this as a weakness of the paper — and perhaps more interestingly, are there any real experts in this arena?
  5. Is there a way to crowdsource credibility annotation of tweets that does not rely on inductive preprocessing? I would suggest that tokenization, stop words and other filters distorts the assessment of tweets to the point where this system can never be functionally practical for the purposes of real social sciences research..

Read More

Social Media Analytics Tool Vox Civitas

Paper:

Diakopoulos, N., Naaman, M., & Kivran-Swaine, F. (2010). Diamonds in the rough: Social media visual analytics for journalistic inquiry (Links to an external site.)Links to an external site.. In 2010 IEEE Symposium on Visual Analytics Science and Technology (pp. 115–122).

Discussion Leader: Lee Lisle

Summary:

Journalists are increasingly using social media as a way to gauge response to various events.  In this paper, Diakopoulos et al. create a social media analytics tool for journalists to quickly go through large amounts of social media output that reference a given event.

In their tool, Vox Civitas, a journalist can input social media data for the program to process.  First, each tweet is processed along 4 different metrics: relevance, uniqueness, sentiment, and keyword extraction.  Relevance weeds out the tweets that are too delayed in the reaction to the event.  If a tweet reacts to a part of the event that is fairly old, it is weeded out as it is not an initial reaction to that part.  This tool is trying to assess the messages of the tweets as the event happens.  Uniqueness weeds out the messages that are not unique; this mostly accounts for responses that aren’t actually stating any additional reaction.  This metric also weeds out tweets that are too unique; these tweets are considered not about the actual event.  The third metric, sentiment, is measured via sentiment analysis.  Every tweet is processed as positive or negative to what is happening in the event.  Lastly, keyword extraction pulls out popular words in the tweets that have relevance to the event.  This is measured via their tf-idf score.

After explaining how their program processes the social media posts, the authors then performed an exploratory user study on their program.  They found 18 participants with varying levels of experience in journalism: 7 professional journalists, 5 student journalists, 1 citizen journalist, 2 with journalism experience, and 3 untrained participants.  Each of these people used the tool online remotely and answered an open-ended questionnaire.

The questionnaire had the participants run through example uses of Vox Civitas.  Through the questions, the authors identified 2 primary use cases for the program: a way to find people to interview and an ideation tool.  In other words, the tool could sort through the social media posts for the users so that they could find insightful or relevant people to interview.  Also, the tool could help the users figure out what to write about.  The questionnaire also identified how the tool would shape the angles the users would take on the social media output.  Vox Civitas would help drive articles on event content or articles on audience responses to the event.  Another minor angle the authors found that the participants would use is to create meta-analyses of audience response, where the participants would identify demographics of the social media post writers.

Lastly, the authors discuss ways their tool would assist with the journalistic creativity process.  They state that their tool should allow journalists to skip over the initial phases of sensemaking in order for them to more quickly jump to ideation and hypothesis generation.  Since the tool already processes and highlights different types of responses and shows that aggregated information via graphs and other visualizations, the journalists do not have to waste time sifting through all the data to understand it.

Reflection:

I found this paper to be a unique and in-depth user study of their program.  Furthermore, I found their program to be a way for journalists to quickly understand the reaction of the crowd to an external event.  This is in contrast to many of the ways we have looked at interacting with the crowd so far as this is looking at what the crowd creates or does when they are not prompted to do anything.

There were, however, a few issues I had with the paper.  First, the authors acknowledge that their sentiment analysis algorithm only has a accuracy of 62.4%.  They do point out that this isn’t good enough for journalists to reliably count on when looking at that data, however I would have liked to have seen them explore ways of figuring out a confidence value for the analysis or some other way of weeding out the data.  As a corollary, this could have informed design of the user interface; the neutral label on the sentiment analysis visualization only meant that there weren’t posts to analyze.  I felt that this would be better suited to the program showing that it couldn’t determine the sentiment of the posts.

Another issue with the paper was that it would introduce scores or rating systems without explaining them.  For example, inn section 4.4 the authors mention a tf-idf score without explaining what that scoring system measures.  If the authors had explained that a little better I think I could have understood their methodology for extracting keywords significantly more.

I appreciated the focus the authors placed on their user interface; breaking down what each part measured or what it conveyed was helpful to understand the workflow.  Furthermore, their statistics in section 6.4.3 that detailed how much each part of the interface was used was a good way to illustrate how useful each part of the tool was to the participants.  It also conveyed that the participants did take advantage of the features the authors supplied and they were able to understand their usage.

Questions:

  1. Do you think this tool could be used in fields other than journalism?  How would you use it?
  2. The authors used a non-lab study to enhance the ecological and external validity of the study, and tracked how the users interacted with the interface.  Do you think any data was lost in this translation?
  3. The professional journalists were noted to have not used the keyword functionality of the interface, and the authors did not follow up with them to find out why.  Do you have any idea why they might have avoided this functionality?
  4. The participants noted that one way of using this tool was to figure out any links between demographics of the audience and their responses.  Have you seen this in more recent media?

 

Read More

Semantic Interaction for Visual Text Analytics

Paper:
Endert, A., Fiaux, P., & North, C. (2012). Semantic Interaction for Visual Text Analytics. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 473–482). New York, NY, USA: ACM.

Discussion Leader:
Tianyi Li

Summary:
Youtube Video:

 

An important part of visual analytics is the statistical models used to transform data and generate visualization (visual metaphor). Analysts manipulate the visualization by tuning the parameters in those statistical models in their information foraging process. When it comes to information synthesis, analysts most often externalize their thought process by spatial layout the information. However, there is a gap between the two processes in terms of the tools available, with most visual analytics tools only focus on one side or another.

This paper takes this opportunity and bridge the two processes via “semantic interaction”, which manipulate parameters of the statistical models by interpreting analyst’s interaction (mostly spatial) with the visualization. Such interaction relieves analysts from understanding the complicated statistical models so as to focus more on the semantical meanings of the text data under analysis. This also opens new design space for visual analytic interaction. The author demonstrated an example visual analytic tool they developed as a proof of concept.

The visual analytic tool, called ForceSPIRE, combines machine learning techniques and interaction analysis, to assist analysts in analyzing intelligence analysis text documents. The system focus on the document similarity, represented by entity overlap and the amount of user interaction happened (conceptually) and spatial layout (visually). User interactions will influence the values of encoded feature labels of documents, thus trigger updates on the visual layout to reflect their sensemaking progress.
ForceSPIRE encodes text documents via the following concepts:

  • Soft data: stored result of user interaction as interpreted by the system
  • Hard data: extracted from the raw textual information (e.g., term or entity frequency, titles, document length, etc.).
  • Each entity has an importance value
    • initialized with tf-idf score
    • updated by user interaction hit
    • always normalized to 1
  • Each document has a mass
    • The number of entities corresponds to the mass of each document a.k.a nodes in the force-directed graph model, where heavier nodes do not move as fast as lighter nodes.)

Reflection: 
This paper is very well written and one of the cornerstone papers for my research area and project. Semantic interaction bridge the gap between information foraging and synthesizing, which relieved the burden of analysts to learn about complicated statistical models. In other words, it realized the purpose of those models, and a fundamental mission of visual analytics: to hide the unnecessary complexity of data processing details from the analysts so that they can achieve productive analysis via visual metaphors.

The way ForceSPIRE computes similarity is to start with “hard data”, which gives an initial spatial layout of documents. Then while users interact with the visualization, the result of user interaction (“soft data”) is stored and interpreted by the system to update the spatial layout in real-time. Entity is the smallest unit of operation and computation. The initial entity values are assigned by computing tf-idf scores. Then one hit of user interaction, i.e., the entity was included in a highlight, it was searched, it was in a note, etc. , will increase 10% of the value (thus reducing other entities importance value). Each entity has its importance value and the sum is always normalized to be 1. Document, which contains entities, is of one level higher granularity. They assign a mass to documents to account for the number of entities in it. Similarly to entities, documents also increase their mass by 10% with 1 hit of user interaction, i.e., text was highlighted, it was the result of a search hit, or if user added more entities through annotations.

This is an over simplified scenario just to prove the concept, and there are following works on this, like StarSPIRE, which released the assumptions of large screen and small number of documents.

Questions:

  1. In ForceSPIRE, it supports “undo”, where user press ctrl-z and rewind to the previous state without the most recent interaction. The entity importance values and document mass values are all rewinded but the spatial layout is not. They also recommended the users to have small-distance document movement. Why not enable undo for document location as well? What is the reason for not doing this?
  2. In ForceSPIRE, they focused on document “similarity”. In the second paper today, they analyze news content by “relevance” and “uniqueness”, which seems to me is a finer tuned breakdown of the concept of similarity: relevance is the similarity to the topic of interest, uniqueness is the negate of similarity. Do you agree with me? How do you think other “computational enablers”, borrowing the term from the second paper, can be used in semantic analysis?
  3. In ForceSPIRE, semantic analysis is applied to intelligence analyses which has specific facts to discover or questions to answer. How do you think semantic analysis can be applied to journalism or news content analysis? How are these scenarios different? How would such difference influence our analysis strategy?

Read More

A way to quantify media bias?

Paper:

Ceren Budak, Sharad Goel, Justin M. Rao; Fair and Balanced? Quantifying Media Bias through Crowdsourced Content Analysis (Links to an external site.)Links to an external site.Public Opinion Quarterly, Volume 80, Issue S1, 1 January 2016, Pages 250–271.

Discussion Leader:

Lawrence

Summary:

We all believe that certain news organizations have a certain agenda they try to push but there is no real way to quantify it since it would be easy to add personal bias to a situation. According to this paper however, through the combination of machine learning and crowd sourced work, selection and framing characteristics can be brought to light. Judging fifteen media outlets and using 749 human judges, over 110,000 articles were classified as political out of over 800,000 in order to find out that with the exception of certain types of news reports (mostly political scandals) most major news operations give an unbiased opinion and in the event of a political scandal, organizations tend to criticize the opposing ideology rather than directly advocating for whatever they believe.

On a scale of -1 to 1, news organizations were graded based on how much slant they had to either the left of right respectively. News stories were surprisingly closely slanted as opposed to opinion based stories. Outside of the blog sites, results were as close as 0.16 points apart (out of 2).

Reflections:

For the first time it has been shown with numbers that there is a difference between news reports regarding articles of an opinionated nature. This study helps to point out the fact that there are several instances which are not giving pure information to viewers. It is a good starting point for recognizing and categorizing media bias on either political side. The issues of not being well informed of both sides of an issue however are much more apparent since there is now a way to clearly define the line between what news you here and what news is available. My biggest criticism is that there was a complaint about how much more time was needed to properly run the study, and as there was no one else running a similar study, I think they could have taken the time they needed. They went through over 800,000 articles and then complained about the amount of time they had left to run the study but I was unable to find any sort of follow up.

There was also a great deal of room for error as users were at times, only given 3 choices of the correct answer. This means there is a 66% chance that the answer was not the favorable answer but 33% is still a big chance of guessing correctly.

Questions:

  • Since there is an actual line which divides left and right news feeds, would it benefit a viewer to watch a news feed which opposes their own views?
  • Was it a surprise that non opinion based news fell closer to neutral on both sides?
  • Is it the responsibility of a news network to make sure they are being neutral in the information they give to their viewers?

Read More

Interacting with news

Article: Interacting with news: Exploring the effects of modality and perceived responsiveness and control on news source credibility and enjoyment among second screen viewers.

[Link]

Author: Michael A. Horning

Presentation Leader: Ri

Summary:

As the technology has created new media of communication, traditional news media have also sought ways to make use of them. According to a study, 46 percent of smartphone owners and 43 percent of tablet owners reported using their devices while watching TV every day. This led to the introduction of second-screen, dual screen, or multiscreen viewing, a type of user experience where the user gets the primary content from the primary screen while simultaneously involving them into user interaction via mobile/tablet device. By expanding telecasting contents on the mobile devices, the organizers get more attention from their audience, both in terms of participation and comprehension. This incident is seen in ads during live shows, sports networks’ second screen interviews, and CNN’s QR codes on live TV, which directed to online content, etc.

Although in a prior study showed that almost half of the population owning a smartphone or a tablet mentioned that they use their devices while watching TV, a later study showed that almost 80% of the users use their second screen to view unrelated contents. On top of that, 66% of all national journalists expressed their concerns about new technologies hurting coverage. In order to make newsrooms capable of adapting to innovations, this research paper [1] raises two important questions. Firstly, whether the additional second screen adds new extent to the enjoyment of the viewers. And secondly, if the second screen content adds more credibility to the source of the news.

In the recent times, several pieces of research have been conducted on second screen content viewing. In a research conducted on 2014 [2], the authors found second screening made it difficult to recall and comprehend news content by increasing cognitive loads, whereas a later research in 2016 [3] showed the second screen rather strengthened users perception of both news and drama. Some researchers have credited the novelty of the experience of new technology, whereas some the interaction, and varying levels of modality as the success of the second screening.

From all these prior researches and much more, the author of the article [1] established six hypothesis in total. The author tested his hypothesis on 83 college-aged students (32 males, 51 females) with a method involving two original news. Both of the news videos were similar in nature with the exception of the last part. In one of the videos, the anchor invites the users manually go to a website to view related stories, whereas, in the other video, the users were invited to scan the TV using an iPad. The prior video was identified as the Low Modal Interactivity and the later video as High Modal Interactivity.

 

 

The table below shows the six hypotheses and the research findings:

# Hypothesis Results
H1 Second screen experiences with higher modality will be rated as more enjoyable than second screen experiences with lower modality Contradict
H2 Second screen experiences with higher modality will be rated as more credible than second screen experiences with lower modality Contradict
H3a Second screen users that perceive the experience to be more highly interactive measured by perceived control will rate news content as more enjoyable than those who perceive it to be less interactive Support
H3b Second screen users that perceive the experience to be more highly interactive measured by perceived responsiveness will rate news content more enjoyable than those who perceive it to be less interactive Support
H4a Second screen users that perceive the experience to be more highly interactive measured by perceived control will rate news content as more credible than those who perceive it to be less interactive Support
H4b Second screen users that perceive the experience to be more highly interactive measured by perceived responsiveness will rate news content as more credible than those who perceive it to be less interactive Support
H5 Second screen experiences that have higher modality and higher perceived interactivity will be rated more positively and be perceived as more enjoyable Partial support
H6 Second screen experiences that have higher modality and higher perceived interactivity will be rated more positively and be perceived as more credible Partial support

 

Reflection:

Second screen viewing allows the user to interact with the media and giving them the opportunity to get involved with the means of communication. It transforms the lethargic viewer into somewhat active participator by providing some mean of control and interaction. Second screening may also make the contents of the primary screening more enjoyable and more credible since it allows the users with the option of elaborating on the information. Even the novelty of the experience might play some role in making the contents more enjoyable and more credible.

I liked how the author tries to explore several characteristics of second screen viewing through several related papers. The author does a good job of explaining so many prior works involving second screen viewing and even communication and journalism in general. Some of the researchers had opposite notion to each other, and I liked the author’s effort in bringing both kinds of research in the context of second screen viewing.

In my opinion, the author also did a commendable work of setting the premise of the research. I find the six hypothesis equally interesting and worth addressing. What intrigued me though is the way the author tried to design his experiment design for the research. As a second screen viewing, the author chose two scenarios where Low Model Interactivity is identified as clicking links manually with the contrast of High Model Interactivity as scanning the screen with iPad. And finally, to assess the situation, the participants were prompted with a questionnaire. The reason why it intrigued me is that among the 83 participants, 55.5% of them indicated that they had never used QR Codes prior to this research. I also find it interesting that the research did not have any gender effect according to the author.

The findings of the research were interesting in my opinion. The first two hypotheses of the experiment focuses on how structural effects impact news enjoyment and news credibility. Surprisingly, the result suggests that the modality did not emerge as a significant predictor of either news enjoyment or credibility. For the latter hypotheses, second screen users that perceive the experience to be more highly interactive measured by perceived control and by perceived responsiveness rated news content as both more enjoyable and more credible. The final two hypotheses were also partially supported by the research depicting second screen experience to more positive, enjoyable, and credible. In both cases of this two hypothesis, the interaction between modality and perceived responsiveness was not significant, however, the interaction between modality and perceived control was.

Questions:

  • Prior researches suggest higher modality in second screen experiences to be rated more enjoyable and more credible. However, the findings of this research suggest otherwise. Why do you think it is?
  • Among the participants, 55.5% mentioned they had never used QR Codes in their life. Do you reckon previous experience or the lack thereof might have impacted the results?
  • How do you think multiple interactions over a longer period of time might change our perception as an audience?

References:

[1] Horning, M.A., 2017. Interacting with news: Exploring the effects of modality and perceived responsiveness and control on news source credibility and enjoyment among second screen viewers. Computers in Human Behavior, 73, pp.273-283.

[2] Van Cauwenberge, A., Schaap, G. and Van Roy, R., 2014. “TV no longer commands our full attention”: Effects of second-screen viewing and task relevance on cognitive load and learning from news. Computers in Human Behavior, 38, pp.100-109.

[3] Choi, B. and Jung, Y., 2016. The effects of second-screen viewing and the goal congruency of supplementary content on user perceptions. Computers in Human Behavior, 64, pp.347-354.

 

Read More