4/29/2020 – Sukrit Venkatagiri – DiscoverySpace: Suggesting Actions in Complex Software

April 28, 2020April 28, 2020 Sukrit Venkatagiri

Paper: C. Ailie Fraser, Mira Dontcheva, Holger Winnemöller, Sheryl Ehrlich, and Scott Klemmer. 2016. DiscoverySpace: Suggesting Actions in Complex Software. In Proceedings of the 2016 ACM Conference on Designing Interactive Systems (DIS ’16), 1221–1232. https://doi.org/10.1145/2901790.2901849

Summary: In this paper, the authors introduce an extension to Adobe Photoshop, called DiscoverySpace, that provides high-level suggestions based on visual features of an image to help onboard new users. These suggestions/actions are drawn from an online user community. There are several problems users face when using a complex system for the first time, such as unfamiliarity with jargon, ease of use of tutorials, and the fact that many tasks can be accomplished through different routes. User studies showed that DiscoverSpace reduce the overhead in introducing new users to Photoshop, and suggest steps that can be replaced with advanced algorithms to speed up this process in the future.

Reflection: The paper tackles a crucial problem for novice users: providing an easy-to-use on-boarding process that simultaneously teaches them how to use the system without scaring them away with to much information. DiscoverySpace is an initial attempt to address this problem by learning from previous users and shows how we can combine AI and user interface design to build effective on-boarding tools for complex systems.

While this system was used specifically for Photoshop, I wonder how a similar approach can be used for other systems that many administrators and creatives use. For example, tax software, payroll processing software, video editing tools, etc. I also wonder if such approaches using data mining can be universalized, or if it is dependent on the user, context, and tool. Sometimes all a user wants to do is crop an image, and not do something more complex. Other times, users may already be following an online tutorial, after searching for the one they like. Perhaps tools like DiscoverySpace should allow for other users to create their own on-boarding workflows instead of having a fixed one for all users. This is because, as mentioned, creative tools allow for multiple flows that can lead to the same output, and some may make more sense to users than others.

Finally, I really appreciate the discussion section. in this paper, since it presents ideas for designing a more universal toolkit. While we can’t build a dataset of all possible actions for complex tools and creative processes, it would still help.

Questions:
1. Do you think everyone should undergo the same on-boarding process when using creative tools, or be given the choice to go through different pathways?
2. Why are these tools so complex? How can we provide more features without introducing more complexity into the information architecture?
3. What are some drawbacks to this approach?

Tech Demo: LegionTools

December 4, 2017December 12, 2017 Sukrit Venkatagiri

Paper:

W.S. Lasecki, M. Gordon, D. Koutra, M.F. Jung, S.P. Dow and J.P. Bigham. Glance: Rapidly Coding Behavioral Video with the Crowd. In Proceedings of the ACM Symposium on User Interface Software and Technology (UIST 2014).

Brief Overview:

When running studies on Amazon Mechanical Turk, we often need a large number of workers in a short amount of time: either to get survey responses quickly or for real-time and synchronous tasks. LegionTools is the answer to this problem. It is a toolkit and interface that can set up a HIT on MTurk through an algorithm that posts and expires a steady stream of HITs (that all point to the same task), to quickly gather a large number of Turkers.

It can also pool these workers in a “waiting room” of sorts, and a selected subset (or all) of these workers can be routed to a real-time synchronous task, with the push of a button.

Download Link: https://github.com/cromaLab/LegionTools or http://rochci.github.io/LegionTools/

Steps (taken from LegionTools GitHub page):

Add a new task by typing a unique session name, title, description, keywords, and clicking Add new task. Remember your session name, you can use it to pull up your session later on.
Set the target number of workers. Set the price range and click “Update price”.
Click Change waiting page instructions to edit the text that workers will be shown while waiting for your task.
Click Start recruiting to beging recruiting. You must click Stop recruiting to end the recruiting process. Note that stopping recruitment will take some time, depending on your target number of workers.
Pull up a previous task using just your task session name. If you closed the UI page and left the recruiting tool running, you may stop that recruiting process by loading the associated session and clicking Stop recruiting.
Modify task title, description, and keywords with Update. Changes automatically affect all new HITs posted by the recruiting tool.
Send workers to a URL with the Fire button. Your chosen URL must be HTTPS.
When you are ready to review completed HITs, click Reload in the Overview section to load all reviewable HITs associated with a given task session.

Family Matters: Control and Conflict in Online Family History Production

November 29, 2017December 12, 2017 Sukrit Venkatagiri

Paper: Willever-Farr, H. L., & Forte, A. (2014). Family Matters: Control and Conflict in Online Family History Production. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing (pp. 475–486). New York, NY, USA: ACM.

Discussion Leader: Sukrit V

Summary:

This paper explores two family history research (FHR) communities: findagrave.com and ancestry.com. FHR communities are uniquely distinct from online encyclopedia communities such as Wikipedia in that they exist to meet an individual’s or family’s needs and wants, taking into account identity building and memorialization.

This work aims to answer three questions, namely:

What tensions and conflicts arise as FHRs engage both individuals/their families and the public in the construction of historical information.
How are these tensions and conflicts negotiated?
How do these FHR communities constrain and support negotiation of these tensions?

In order to answer these questions, the authors performed a qualitative study on message boards, forum data and interviews with FHRs.

They noted that Ancestry’s policies forced most users to make their familial information private, while Find a Grave’s content moderation policies and lack of recommendations allowed for higher quality research-backed content.

Their interviewees spoke about their personal and public-oriented motivations for contributing to these communities as well as factors that detracted from their participation in these communities.

The authors conclude with a discussion of policy and design considerations.

Reflection:

It was interesting to see how Ancestry.com, which was a paid service, had lower quality contributions (thanks to the “clickologists”) than Findagrave.com. The authors attributed this to the way Ancestry is marketed and its automated suggestion tools.

In addition, all the contributors to the FHR communities fiercely defended their family tree’s “turf”, yet viewed the sites as a historical resource for the public. The members even went to other graveyards to collect and share information about non-related individuals because others may need that information.

Ancestry’s problems with inaccurate content, bogus “poison-ivy” suggestions and content curation policies led to all but one of the interviewees to make their family trees private – thus reducing the viability of Ancestry as a public information source. Clearly, this is indicative of how policies on any ‘social networking’-esque site can ruin or boost its reputation among users. Ancestry even has arrangements with archival repositories in the US and Europe to make records available to subscribers, and yet, has lower quality content that Find a Grave.

The authors note that inexperienced users on FHR communities require instruction. Their design consideration of including “learning spaces” within these communities could prove particularly useful in increasing content quality.

One benefit of the familial oversight of content in these FHR communities, however, is that it facilitated a sense of ownership and thus allowed for certain individuals to have ‘domain’ over certain parts of the knowledge base. Yet, this same oversight also hindered others from contributing (possibly) accurate information – due to the volunteers of Find a Grave becoming overwhelmed. Perhaps allowing for Find a Grave community members to be elevated to the post of moderators would alleviate this issue.

It is interesting to note that certain members of these FHR communities competed with one another to boost the number of memorials on the website. The authors do not discuss why contributors would want to do so, however. Perhaps each person’s profile has a number of how many they have contributed to – and thus, gamification is something that should be avoided.

Interviewing the volunteers and the owner of Find a Grave, and even Ancestry would perhaps provide more fruitful insights into these communities. Perhaps analyzing the requirements and needs of Find a Grave’s volunteers would indicate that they did not need help or did not believe that the community members could serve as moderators.

Further, FHR sites should include a standard set of guidelines when contributing that includes what information should be provided, how it should be written and also ask for specific sources or proof.

Questions:

Why do you think Ancestry.com, a paid service (almost $189-$299 a year), had more “clickologists” or “fake researchers” than Findagrave.com, a free-to-use, freely accessible service?
Clearly, Find a Grave needs more volunteers. Do you think existing community members should be allowed to become moderators? Why or why not?
Would you ever contribute to either of these two communities? If yes, would you make your family tree public or private?
How would you improve Ancestry’s ‘hint’ feature? (Perhaps suggestions from humans as opposed to automated suggestions would improve the quality of connections.)
Ancestry now includes DNA analysis to “reveal your ethnic mix and ancestors you never knew you had”. Do you think this would improve user trust in the website?

Tweeting is Believing? Understanding Microblog Credibility Perceptions

November 27, 2017December 12, 2017 Sukrit Venkatagiri

Paper: Morris, M. R., Counts, S., Roseway, A., Hoff, A., & Schwarz, J. (2012). Tweeting is Believing?: Understanding Microblog Credibility Perceptions. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work (pp. 441–450). New York, NY, USA: ACM.

Discussion Leader: Sukrit V

Summary:

Twitter has increasingly become a source of breaking news and thus assessing the credibility of these news tweets is important. Generally, users access tweets in one of three ways: through their news feed (from people they follow), from Twitter’s recommended trending topics and through search. When searching for tweets, users generally have less information to base their credibility judgements on as compared to content they get from followers (who they presumably know or know of).

This paper investigates features that impact users’ assessments of tweet credibility. The authors ran an initial pilot study where they observed users thinking aloud while conducting a search on Twitter. It was noted that the participants often commented on the user’s avatar, user name and the contents of the tweet. They obtained 26 such features from the pilot study, which they used to design a survey.

In the survey, respondents were asked to assess how each feature impacts credibility on a five-point Likert scale and other demographic data. They found out that tweets encountered through search elicited more concern for credibility than those from people they followed. In addition, topics such as break news, natural disasters, and politics was more concerning than celebrity gossip, restaurant reviews, and movie reviews. The features that most enhanced a tweet’s credibility perception included the author’s influence, topical expertise, and reputation. Content-related features included a URL leading to a high-quality site and the existence of other tweets conveying similar information. Conclusively, the participants were aware of features that conveyed credibility but most of them were obscured due to Twitter’s interface and at-a-glance nature.

They conducted two experiments following their findings from the survey and pilot study. The aim of both was to find out the impact of these tweet features on credibility perceptions. In the first experiment, they varied the message topic, user name, and user image and found that participant’s tweet credibility and author credibility assessments were highly correlated, and that they were generally unaware of the true truth values of the tweets i.e. which ones were false and which were true. In addition, message topic and user names affected credibility. The user names that were topical received higher ratings than traditional names and internet names. However, due to the design of the experiment, users only saw profiles with the same avatar type (default, female, male, generic, and topical) . Thus, they designed a second experiment where each person was subject two different types of images and found that default Twitter icon avatars received significantly lower ratings, while other image types did not show any significant difference.

The authors concluded with the implications of their findings. That is, for individual users those who plan to tweet only on one topic should pick topical user names, otherwise, they should select a traditional user name. Non-standard grammar and default user photos also damaged credibility. They also suggested design changes to Twitter’s interface: author credentials should be made visible at a glance and metrics on the number of retweets or number of times a link has been shared, and who is sharing those tweets, would provide users with more context for assessing credibility. Conclusively, the authors found that users’ credibility judgements relied on heuristics and were often systematically biased. This highlights the significance of presenting more information on Tweeters when they are unknown to the user.

Reflection:

Overall, this was a very well-designed study. I particularly liked how the authors approached the survey design: by first performing a pilot study to determine what to ask, asking these questions, determining what features were important from the answers to these questions, finally narrowing down to three features, and studying the effects of these features.

Their follow-up study was necessary and could have been possibly overlooked had they not hypothesized why there was no significant impact of user image type on tweet credibility and author credibility ratings.

The basis for the entire study was reliant on Twitter users using its search feature and encountering random tweets. Clearly, search is not always implemented that way: sometimes it is organized by popularity, how recent the tweet was made and possibly by how ‘distant’ (by the number of degrees of connection) the tweet is. Obviously, mimicking this in a study is difficult to do. It would be interesting if Twitter’s search algorithm ordered tweets based on how ‘distant’ each tweet’s author was from the user (based on the number of degrees of connection).

I appreciate that the authors accounted for race and age-related bias in their study when determining the different user photos to use. Further, they mention that there is no difference in credibility ratings for male and female profiles. Of special mention is Thebault-Spieker’s recent work [1] on the absence of race and gender bias in Amazon Mechanical Turk workers. The participants for this study are from an email list, however, and the question arises, why is there no gender related bias? Or if there is, how statistically significant is it?

The tweet content that they use for the study does not seem particularly ecologically valid, however, having recently joined Twitter, I do not believe I am the right person to comment on the same. For example, they used only bit.ly links in the Tweets and the text itself is not how most people on Twitter write. Their use of ecologically valid Twitter avatars and gender-neutral user names, however, is commendable.

Their finding that participants were generally unable to distinguish between true and false (both plausible) information was of particular note.

They also noted that topical user names did better than user names of the other two types. Perhaps this was because of a lack of any biographic information (as they hypothesize). An interesting follow-up study would be to test whether topical user names with biographical information or traditional user names with similar biographical information would be perceived as being more credible.

Questions:

Do you think journalist or technology-savvy people would perform search and assess search results on Twitter differently from typical users? If so, how?
The participants for this study are from an email list, however, and the question arises, why is there no gender related bias? Or if there is, how statistically significant is it? Perhaps it is due to the design of the experiment and not necessarily that all 200-odd participants were free from gender-related bias. [Re: Thebault-Spieker’s CSCW paper (please ask me for a copy)]
Do you believe the content of their tweets were ecologically valid?
Why do you think topical user names did better than traditional and internet user names?
How could search results be better displayed on social networks? Would you prefer a most-recent, random, or ‘distance’-based ordering of tweets?
Aside: Do you think the authors asked how often the participants used Bing Social Search (and not only Google Real Time Search) because they work at Microsoft Research?

References:

[1] Jacob Thebault-Spieker, Daniel Kluver, Maximilian Klein, Aaron Haflaker, Brent Hecht, Loren Terveen, and Joseph Konstan 2018. Simulation experiments on (the absence of) ratings bias in reputation systems. In Proceedings of the 20th acm conference on computer supported cooperative work & social computing(CSCW ’18).

Exploring Privacy and Accuracy Trade-Offs in Crowdsourced Behavioral Video Coding

November 1, 2017December 12, 2017 Sukrit Venkatagiri

Paper:

Walter S. Lasecki, Mitchell Gordon, Winnie Leung, Ellen Lim, Jeffrey P. Bigham, and Steven P. Dow. 2015. Exploring Privacy and Accuracy Trade-Offs in Crowdsourced Behavioral Video Coding. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI ’15). ACM, New York, NY, USA, 1945-1954. DOI: https://doi.org/10.1145/2702123.2702605

Discussion Leader: Sukrit V

Summary:

Social science and interaction researches often need to review video data to learn more about human behavior and interaction. However, computer vision is still not advanced enough to automatically detect human behavior. Thus, video data is still mostly manually coded. This is a terribly time-intensive process and often takes up to ten times the length of a video to code it.

Recent work in the crowdsourcing space has been successful in using crowds to code for behavioral cues. This method, albeit much faster, introduces other concerns.

The paper provides a brief background of how video data is typically coded and current research in the space of video annotation using crowds. The authors interviewed twelve practitioners who have each coded at least 100 hours of video to obtain a better understanding of current video coding practices and what they believe are the potential benefits and concerns of utilizing crowdsourcing in a behavioral video coding process. From the interviews, they deduced that video coding is a time-consuming process and is used in a wide variety of contexts and to code for a range of behaviors. Even developing a coding schema is difficult due to inter-rater agreement requirements. The researchers were open to the idea of using online crowds as part of the video coding service, but they had concerns with the quality and reliability of the crowds, in addition to maintaining participant privacy and meeting IRB requirements.

The paper details an experimental study to explore the ability and accuracy of crowds to code for a range of behavior, and how obfuscation methods would affect a worker’s ability to identify participant behavior and identity. From the first experiment, they were able to obtain relatively high precision and recall rates for coding a range of behaviors, except for smiling and head turning. This was attributed to a lack of clarity on the instructions and example provided for those two. In the second experiment, they varied the blur level of the videos and observed that the decay rate in personally identifiable information dropped more steeply than the F1 rates did. This means that, it is easier to preserve privacy at higher blur levels and still maintain relatively good levels of precision and recall.

The authors also created a tool, Incognito, for researches to test what level of privacy protection filters are sufficient for their use case and what impact it would have on the accuracy of the crowdsourced coding. They conclude with a discussion of future work: utilizing different approaches to filtering, and performing real-world usage studies.

Reflection:

The paper is rather well organized, and the experiments were quite straightforward and well-detailed. The graphs and images present were sufficient.

I quite liked the ‘lineup tool’ that they utilized at the end of each video coding task and mimicked what is used in real life. In addition, their side-experiment to determine whether the workers were better at identifying participants if they were prompted beforehand, is something that I believe is useful to know and could be applied in other types of experiments.

I believe the tool they designed, Incognito, would prove extremely useful for researchers since it abstracts the process of obfuscating the video and hiring workers on MTurk. However, it would have been nice if the paper mentioned what instructions the MTurk workers were given on coding the videos. In addition, perhaps training these workers using a tutorial may have produced better results. Also, they noted that coding done by experts is a time-consuming process and the time taken to do so linearly correlates with the size of the dataset. Something that would be interesting to study is how the accuracy of coding done by the crowd-sourced workers would change with increased practice over time. This may further reduce the overhead of the experts, provided that coding standards are maintained.

Furthermore, the authors mention the crowdsourced workers’ precision and recall rates, but it would be nice if they had looked into the inter-rater agreement rates as well since that plays a vital role in video coding.

For coding smiles they used an unobfuscated window of the mouth, and the entire study focuses on blurring the whole image to preserve privacy. I wish they had considered – or even mentioned – using facial recognition algorithms to blur only the faces (which I believe would still preserve privacy to a very high degree), yet greatly increase the accuracy when it comes to coding other behaviors.

Overall, this is a very detailed, exploratory paper in determining the privacy-accuracy tradeoffs when utilizing crowdsourced workers to perform video coding.

Questions:

The authors noted that there was no change in precision and recall at blur levels 3 and 10 when the workers were notified that they were going to be asked to identify participants after their task. That is, even when they were informed beforehand about having to perform a line-up test, they were no better or worse at recognizing the person’s face (“accessing and recalling this information”). Why do you think there was no change?
Can you think of cases where using crowdsourced workers would not be appropriate for coding video?
The aim of this paper was to protect the privacy of participants in the videos that needed to be coded, and was concerned with their identity being discovered/disclosed by crowdsourced workers. How important is it for these people’s identities to be protected when it is the experts themselves (or for example, some governmental entities) that are performing the coding?
How do you think the crowdsourced workers compare to experts with regards to coding accuracy? Perhaps it would be better to have a hierarchy where these workers code the videos and, below a certain threshold level of agreement between the workers, the experts would intervene and code the videos themselves. Would this be too complicated? How would you evaluate such a system for inter-rater agreement?
Can you think of other approaches – apart from facial recognition with blurring, and blurring the whole image – that can be used to preserve privacy yet utilize the parallelism of the crowd?
Aside: How do precision and recall relate to Cohen’s kappa?

Tech Demo: Snap Map

September 25, 2017December 12, 2017 Sukrit Venkatagiri

www.npr.org/sections/goatsandsoda/2017/07/06/535076690/can-snapchats-new-snap-map-bring-the-world-closer-together

Brief Overview
Snap Map is a feature of Snap Inc.’s Snapchat application that gives users a searchable world map, and aggregates geotagged Snaps taken in the last 24 hours. Locations that are particularly popular are highlighted on the map with a heatmap gradient that ranges from sky blue to yellow to red.

Snap Map was introduced in June 2017 and received criticism on how it exacerbated existing privacy and security issues. However, an additional – perhaps, unforeseen – use is keeping tabs on loved ones in disaster-prone areas, monitoring one’s surroundings in these areas, and in investigative journalism. With Snapchat’s user base of 166 million (that is now beginning to look small in comparison to Instagram’s 250 million user base) posting at least 700 million photos per day – and especially with the introduction of Snap Map – Snapchat is increasingly becoming a source of information for journalists.

Snap Map is particularly useful because stories are geotagged and cannot be uploaded retroactively (unless the user goes to great lengths to upload old content, there is a relative amount of surety). The timestamped and geotagged content visible on Snap Map can be used to generating a timeline of events. The heatmap is helpful – if not crucial – in discovering events that may have not yet been covered by other news sites, for fact-checking, or for gaining additional insights into an emerging story. [As I will demonstrated in my demo.]

Steps to use Snap Map

Make sure you have the Snapchat application installed on your Android or iOS device.
Log in to Snapchat or create a new user account.
From the main screen, pinch out with your fingers; this will bring up the Snap Map feature.
The user is presented with a brief overview of Snap Map.
The map is now displayed, zoomed-in to the user’s current location with a heatmap of the area.
One can zoom out and pan to different areas, and certain hotspots are annotated with textual information.
Zoom in, and long-press on a particular location.
The user is presented with Snaps that were taken around that location and uploaded within the last twenty four hours.
Additionally, you can search for popular locations around the world from the search bar.
To upload your own content to Snap Map, take a picture or video and make sure you geotag it with your current location. Simple as that! (Eerily simple, rather.)

Integrating On-demand Fact-checking with Public Dialogue

September 4, 2017December 12, 2017 Sukrit Venkatagiri

Paper:

Kriplean, T., Bonnar, C., Borning, A., Kinney, B., & Gill, B. (2014). Integrating On-demand Fact-checking with Public Dialogue. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing (pp. 1188–1199). New York, NY, USA: ACM. https://doi.org/10.1145/2531602.2531677 (Links to an external site.)

Discussion Leader: Sukrit V

Summary:

This article aims to understand the design space for inserting accurate information into public discourse in a non-confrontational manner. The authors integrate a request-based fact-checking service with an existing communication interface, ConsiderIt – a crowd-sourced voters guide. This integration involves the reintroduction of professionals and institutions – namely, librarians and public library systems – that have up to now been largely ignored, into crowdsourcing systems.

The authors note that existing communication interfaces for public discourse often fail to aid participants in identifying which claims are factual. The article first delves into different sources of factually correct information and the format in which it should be conveyed to participants. They then discuss who performs the work of identifying, collating and presenting this information: either professionals or via crowdsourcing. Lastly, where this information is presented by these people is crucial: through single function entities such as Snopes or Politifact, embedded responses, or overlays in chat interfaces.

Their system was deployed in the field during the course of a real election with voluntary users – the Living Voters Guide (LVG), and utilized librarians from the Seattle Public Library (SPL) as the fact-checkers.Initial results indicated that participants were not opposed to the role played by these librarians. One key point to note is the labeling of points post verification: accurate, unverifiable and questionable. The term “questionable” was specifically chosen since it is more considerate of users’ feelings – as opposed to the negative connotation associated with “wrong” or a red X.

The rest of the article discusses balancing the task of informing LVG users of which pro/con points were factual but in a non-confrontational manner. The decision to prompt a fact-check was in the hands of the LVG participants and the fact-check was performed only on the factual component of claims and presented in an easy-to-assess manner. From the perspective of the SPL librarians, they played a crucial role in determining the underlying features of the fact-checking mechanism.

In the results, the authors were successfully able to determine that there was a demand for a request-based fact-checking service, and that the SPL librarians were viewed and welcomed as trustworthy participants which simultaneously helped improve the credibility of the LVG interface. Based on Monte Carlo simulations, the authors demonstrate that there was an observable decrease in commenting rates before and after fact-checking, having taken into account temporal effects.

Conclusively, the authors note that the journalistic fact-checking framework did not interface well with librarian referencing methods. In their implementation, there was also no facility for enhanced communicating between the librarians, the user whose point was being checked, and the requester. The method in which the fact-checks were displayed tended to dominate the discussion section and possibly caused a drop in comment rates. Some librarians were of the opinion that they were exceeding their professional boundaries when determining the authenticity of certain claims – especially those pertaining to legal matters.

Reflections:

The article made good headway in creating an interface to nudge people towards finding a common ground. This was done through the use of unbiased professionals/institution vis-à-vis librarians and the Seattle Public Library, in a communication interface.

The involvement of librarians – who are still highly trusted and respected by the public – is notable. These librarians help the LVG participants finding verified information on claims, amidst a deluge of conflicting information presented to them by other users and on the internet. One caveat – that can only be rectified through changes in existing laws – is that librarians cannot perform legal research. They are only allowed to provide links to related information.

On one hand, I commend the efforts of the authors to introduce a professional, unbiased fact-checker into a communication system filled with (possibly) misinformed and uninformed participants. On the other, I question the scalability of such efforts. The librarians set a 48-hour deadline on responding to requests, and in some cases it took up to two hours of research to verify a claim. Perhaps this system would benefit from a slightly tweaked learnersourcing approach utilizing response aggregation and subsequent expert evaluation.

Their Monte Carlo analysis was particularly useful in determining whether the fact-checking had any effect on comment frequency, versus temporal effects alone. I also appreciate the Value Sensitive Design approach the authors use to evaluate the fact-checking service from the viewpoint of the main and indirect stakeholders. The five-point Likert scale utilized by the authors also allows for some degree of flexibility in gauging stakeholder opinion, as opposed to binary responses.

Of particular mention was how ConsiderIt, their communication interface, utilized a PointRank algorithm which highlights points that were more highly scrutinized. Additionally, the system’s structure inherently disincentivizes gaming of the fact-checking service. The authors mention triaging requests to handle malicious users/pranksters. Perhaps this initial triage could be automated, instead of having to rely on human input.

I believe that this on-demand fact-checking system shows promise, but will only truly be functional at a large scale if certain processes are automated and handled by software mechanisms. Further, a messaging interface wherein the librarian, the requester of the fact-check, and the original poster can converse directly with each other would be useful. Perhaps that would defeat the purpose of a transparent fact-checking system and eschew the whole point of a public dialogue system. Additionally, the authors note that there is little evidence to show that participants short-term opinions changed. I am unsure of how to evaluate whether or not these opinions change in the long-term.

Overall, ConsiderIt’s new fact-checking feature has considerably augmented the LVG user experience in a positive manner and integrated the work of professionals and institutions into a “commons-based peer production.”

Questions:

How, if possible, would one evaluate long-term change in opinion?
Would it be possible to introduce experts in the field of legal studies to aid librarians in the area of legal research? How would they interact with the librarians? What responsibilities do they have to the public to provide “accurate, unbiased, and courteous responses to all requests”?
How could this system be scaled to accommodate a much larger user base, while still allowing for accurate and timely fact-checking?
Are there certain types of public dialogues in which professionals/institutions should not/are unable to lend a hand?