02/19/2020 – Sukrit Venkatagiri – In Search of the Dream Team

February 18, 2020 Sukrit Venkatagiri 1 Comment

Paper: Sharon Zhou, Melissa Valentine, and Michael S. Bernstein. 2018. In Search of the Dream Team: Temporally Constrained Multi-Armed Bandits for Identifying Effective Team Structures. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI ’18), 1–13. https://doi.org/10.1145/3173574.3173682

Summary: This paper introduces a system called DreamTeam that explores a search space for the optimal design of teams in an online setting. The system does this through multi-armed bandits with temporal constraints, a type of algorithm that manages the timing of exploration–exploitation trade-offs across multiple bandits simultaneously. This answers a classic question in HCI and CSCW: when should teams favor one approach over another? The paper contributes a computationally identifiable method of good team structures, a system that manifests this, and an evaluation with improvements of 46%. The paper concludes with a discussion of computational partners for improving group work, such as aiding us by pointing out our biases and inherent limitations, and helping us replan when the environment shifts.

Reflection:

I appreciate the way they evaluated the system and conducted randomized controlled trials for each of their experimental conditions. The evaluation is done on a collaborative intellective task, and I wonder how different the findings would be if they had evaluated it using a creative task, instead of an intellective or analytic task. Perhaps there is a different optimal “dream team” based not only on the people but the task itself.

I also appreciate the thorough system description and how the system was integrated within Slack as opposed to having it be its own standalone system. This increases the real world generalizability of the system and also means that it is easier for others to build on top of. In addition, hiring workers in real-time would have been hard, and it’s unclear how synchronous/asynchronous the study was.

One interesting approach is considering both types of bandits simultaneously, exploration and exploitation. I wonder how the system might have fared if teams were given the choice to explore each on their own—probably worse.

Another interesting finding is the evaluation with strangers on MTurk. I wonder if the results would have differed if it was a) in a co-located setting and/or b) among coworkers who already knew each other.

Finally, the paper is nearly two years old, and I don’t see any follow up work evaluating this system in the wild. I wonder why or why not. Perhaps there is not much to gain through an in-the-wild evaluation, or that an in-the-wild evaluation did not fare well. Either way, it would be interesting to read about the results—good or bad.

Questions:

Have you thought about building a Slack integration for your project instead of a standalone system?
How might this system function differently if it were for a creative task such as movie animation?
How would you evaluate such a system in the wild?

02/19/2020 – Sukrit Venkatagiri – The Case of Reddit Automoderator

February 18, 2020 Sukrit Venkatagiri 2 Comments

Paper: Shagun Jhaver, Iris Birman, Eric Gilbert, and Amy Bruckman. 2019. Human-Machine Collaboration for Content Regulation: The Case of Reddit Automoderator. ACM Transactions on Computer-Human Interaction (TOCHI) 26, 5: 31:1–31:35. https://doi.org/10.1145/3338243

Summary: This paper studies Reddit’s Automod, a rule-based moderator for Reddit that automatically filters content on subreddits, and can be customized by the moderators to suit each subreddit. The paper sought to understand how moderators use Automod, and what advantages and challenges it presented. The paper discusses these findings in detail and the authors found that: there was a need for audit tools to tune the performance of Automod, a repository for sharing these tools, and for improving the division of labor between human and machine decision making. They concluded with a discussion of the sociotechnical practices that shape the use of the tools, how they help workers maintain their communities, and the challenges and limitations, as well as solutions that may help address them.

Reflection:

I appreciate that the authors were embedded within the Reddit community for over one year and provides concrete recommendations for creators of new and existing platforms, for designers and researchers interested in automated content moderation, for scholars of platform governance, and for content moderators themselves.

I also appreciate the deep and thorough qualitative nature of the study, along with the screenshots, however the paper may be too long and too detailed in some aspects. I wish there was a “mini” version of this paper. The quotes themselves were exciting and exemplary of problems the users faced.

The finding that different subreddits configured and used subreddits was interesting and I wonder how much a moderators’ skills and background affects whether and in what ways they configure and use Automod. Lastly, the conclusion is very valuable and especially as it is targeted towards different groups within and outside of academia.

Two themes that emerged, “Becoming/continuing to be a moderator” and “recruiting new moderators” sound interesting, but I wonder why it was left out of the results. The paper does not provide any explanation in regards to this.

Questions:

How do you subreddits might differ in their use of Automod based on their technical abilities?
How can we teach people to use Automod better?
What are the limitations of Automod? How can they be overcome through ML methods?

02/05/2020 – Sukrit Venkatagiri – Power to the People: The Role of Humans in Interactive Machine Learning

February 5, 2020February 5, 2020 Sukrit Venkatagiri Leave a comment

Paper: Power to the People: The Role of Humans in Interactive Machine Learning

Authors: Saleema Amershi, Maya Cakmak, W. Bradley Knox, Todd Kulesza

Summary:
This paper talks about the rise of interactive machine learning systems, and how to improve these systems while also as users’ experiences through a set of case studies. Typically, a machine learning workflow involves a complex, back-and-forth process of collecting data, identifying features, experimenting with different machine learning algorithms, tuning parameters, and then having the results be examined by practitioners and domain experts. Then, the model is updated to take in their feedback, which can affect performance and start the cycle anew. In contrast, feedback in interactive machine learning systems are iterative, rapid, andare explicitly focused on user interaction and the ways in which end users interact with the system. The authors present case studies that explore multiple interactive and intelligent user interfaces, such as gesture-based music systems as well as visualization systems such as CueFlick and ManiMatrix. Finally, the paper concludes with a discussion of how to develop a common language across diverse fields, distilling principles and guidelines for human-machine interaction, and a call to action for increased collaboration between HCI and ML.

Reflection:
The several case studies are interesting in that they highlight the differences between typical machine learning workflows and novel IUIs, as well as the differences between humans and machines. I find it interesting that most workflows often leave the end-user out of the loop for “convenience” reasons, but it is often the end-user who is the most important stakeholder.

Similar to [1] and [2], I find it interesting that there is a call to action for developing techniques and standards for appropriately evaluating interactive machine learning systems. However, the paper does not go into much depth into this. I wonder if it is because of the highly contextual nature of IUIs that make it difficult to develop common techniques, standards, and languages. This in turn highlights some epistemological issues that need to be addressed within both the HCI and ML communities.

Another fascinating finding is that people valued transparency in machine learning workflows, but that this transparency did not always equate to higher (human) performance. Indeed, it may just be a placebo effect where humans feel that “knowledge is power” but that it would not have made any difference. Transparency has other benefits, other than how it relates to accuracy, however. For example, transparency in how a self-driving car works can help debug whom to exactly blame in the case of a self-driving car accident. Perhaps the algorithm was at fault, a pedestrian was, the driver was, the developer was, or it was due to unavoidable circumstances, a la, a force of nature. With interactive systems, it is crucial to understand human needs and expectations.

Questions:

This paper also talks about developing a common language across diverse fields. We notice the same idea in [1] and [2]. Why do you think this hasn’t happened yet?
What types of ML systems might not work for IUIs? What types of systems would work well?
How might these recommendations and findings change with systems where there is more than one end-user, for example, an IUI that helps an entire town decide zoning laws, or an IUI that enables a group of people to book a vacation together.
What was the most surprising finding from these case studies?

References:
[1] R. Jordon Crouser and Remco Chang. 2012. An Affordance-Based Framework for Human Computation and Human-Computer Collaboration. IEEE Transactions on Visualization and Computer Graphics 18, 12: 2859–2868. https://doi.org/10.1109/TVCG.2012.195
[2] Jennifer Wortman Vaughan. 2018. Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research. Journal of Machine Learning Research 18, 193: 1–46. http://jmlr.org/papers/v18/17-234.html

02/05/2020 – Sukrit Venkatagiri – Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research

February 5, 2020 Sukrit Venkatagiri Leave a comment

Paper: Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research
Author: Jennifer Wortman Vaughan

Summary:
This is a survey paper that provides an overview of crowdsourcing research as it applies to the machine learning community. It first provides an overview of crowdsourcing platforms, followed by an analysis of how crowdsourcing has been used in ML research. Specifically, in generating data, evaluating and debugging models, in hybrid intelligent systems, and its use in behavioral experiments. The paper then reviews crowdsourcing literature that studies the behavior of the crowd, their motivations, and ways to improve work quality. In particular, the paper focuses on dishonest worker behavior, ethical payment for crowd work, and the communication and collaboration patterns of crowd workers. Finally, the paper concludes with a set of best practices to be followed for optimal use of crowdsourcing in machine learning research.

Reflection:
Overall, the paper provides a thorough and analytic overview of the applications of crowdsourcing in machine learning research, as well as useful best practices for machine learning researchers to better make use of crowdsourcing.

The paper largely focuses on ways crowdsourcing has been used to advance machine learning research, but also subtly talks about how machine learning can advance crowdsourcing research. This is interesting because it points to how these two fields are highly interrelated and co-dependent. For example, with the GalaxyZoo project, researchers attempted to optimize crowd effort, which meant that fewer judgements were necessary per image, allowing more images to be annotated overall. Other interesting uses of crowdsourcing were in evaluating unsupervised models and model interpretability.

On the other hand, I wonder what a paper that was more focused on HCI research would look like. In this paper, humans are placed “in the loop,” while in HCI (and the real world) it’s often the machine that is in the loop of a human’s workflow. For example, the paper states that hybrid intelligent systems “leverage the complementary strengths of humans and machines to expand the capabilities of AI.” A more human-centered version would be “[…] to expand the capabilities of humans.”

Another interesting point is that all the hybrid intelligent systems mentioned in Section 4 had their own metrics to assess human, machine, and human+machine performance. This speaks to the need for a common language for understanding and then assessing human-computer collaboration, which is described in more detail in [1]. Perhaps it is the unique, highly-contextual nature of the work that prevents more standard comparisons across hybrid intelligent systems. Indeed, the paper mentions this with regards to evaluating and debugging models, that “there is no objective notion of ground truth.”

Lastly, the paper talks about two relevant topics for this semester, the first is algorithmic aversion and how participants who were given more control in algorithmic decision-making systems were more accurate, not because the human judgements were more accurate, but because the humans were more willing to listen to the algorithm’s recommendations. I wonder if this is true in all contexts, and how best to incorporate this work into mixed-initiative user interfaces. The second topic of relevance is that the quality of crowd work naturally varied with payment. However, very high wages increased the quantity of work but not always the quality. Combined with the various motivations that workers have, it is not always clear how much to pay for a given task, necessitating the need for pilot studies—which this paper also heavily insists on. However, even if it was not explicitly mentioned, one thing is certain: we must pay fair wages for fair work [2].

Questions:

What are some new best-practices that you learned about crowdsourcing? How do you plan to apply it in your project?
How might you use crowdsourcing to advance your own research? Even if it isn’t in machine learning.
Plenty of jobs are seemingly menial, e.g., assembly jobs in factories, working in a call center, delivering mail, yet no one has tried to make these jobs more “meaningful” and motivating to increase people’s willingness to do the task.
1. Why do you think there is such a large body of work around making crowd work more intrinsically motivating?
2. Imagine you are doing crowd work for a living, would you prefer to be paid more for a boring task, or paid less for a task masquerading as a fun game?
How much do you plan to pay crowd workers for your project? Additional reference: [2].
ML systems abstract away the human labor that goes into making it work, especially as seen in the popular press. How might we highlight the invaluable role played by humans in ML systems? By “humans,” I mean the developers, the crowd workers, the end-users, etc.

References:
[1] R. Jordon Crouser and Remco Chang. 2012. An Affordance-Based Framework for Human Computation and Human-Computer Collaboration. IEEE Transactions on Visualization and Computer Graphics 18, 12: 2859–2868. https://doi.org/10.1109/TVCG.2012.195
[2] Whiting, Mark E., Grant Hugh, and Michael S. Bernstein. “Fair Work: Crowd Work Minimum Wage with One Line of Code.” In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, vol. 7, no. 1, pp. 197-206. 2019.

1/29/20 – Sukrit Venkatagiri – Affordance-Based Framework for Human Computation

January 28, 2020January 28, 2020 Sukrit Venkatagiri 3 Comments

Paper: An Affordance-Based Framework for Human Computation and Human-Computer Collaboration by R. Jordan Crouser and Remco Chang

Summary:

This paper provides a survey of 49 papers on human-computer collaboration systems and interfaces. The authors highlight some affordances that arise from these collaborative systems and propose an affordance-based framework as a common language for understanding seemingly disparate branches of research and indicate unexplored avenues for future work. They discuss various systems, and provide extensions to these systems that provide human adaptability, and machine sensing. Finally, they conclude with a discussion of the utility of their framework in an increasingly collaborative world, and some complexity measures for visual analytics.

Reflection:

This paper focuses on some fundamental questions in mixed-initiative collaborations, such as how does one tell if a problem even benefits from a collaborative technique, and if so, who is the work delegated to? The paper also provides ways to evaluate complexity in different visual analytic setups, but raises more questions, such as what is the best way to evaluate work, and how can we account for individual differences? These suggestions and questions, however, only beget more questions. The nature of work is increasingly complex, requiring more unique ways to measure success that are application-specific. The paper tries to come up with a one-size-fits-all solution for this, but the solution ends up being more generic.

The paper also highlights the need for a more holistic evaluation approach. Typically, ML and AI research is focused solely on the performance of the model. However, this paper highlights the need to evaluate the performance of both the human and the system that they are collaborating with.

The paper talks about human-computer collaboration, mostly focused on visual analytics. There is still more work to be done in studying how applicable this framework is to physical human-computer interfaces, for example, an exoskeleton, a robot that makes cars, etc. Here, there are different abilities of humans and robots, which are not covered in the paper. Perhaps humans’ visual skills may be combined with a robots’ accuracy.

Questions:

How might one apply this framework in the course of their class project?
What about this framework is still/no longer applicable in the age of deep learning?
Will AI ever surpass human creativity, audio linguistic abilities, and visuospatial thinking abilities? What does it mean to surpass human abilities?
Is this framework applicable for cyber-physical systems? How does it differ?

01/29/20 – Sukrit Venkatagiri – The Future of Crowd Work

January 28, 2020January 28, 2020 Sukrit Venkatagiri 1 Comment

Summary:

This paper surveys existing literature in crowdsourcing and human computation and outlines a framework consisting of 12 major areas of future work. The paper focuses on paid crowd work, as opposed to volunteer crowd work. Envisioning a future where crowd work is attractive to both requesters and workers requires considering work processes, crowd computation, and what crowd workers want. Work processes involves the various workflows, quality control, and task assignment techniques, as well as the synchronicity involved in doing the work itself. Crowd computation can involve crowds guiding AIs, or vice versa. Crowd workers themselves may have different motivations, require additional job support through tools, want ways to maintain a reputation as a “good worker”, and ways to build a career out of doing crowd work. To improve crowd work, it requires re-establishing career ladders for workers, improving task quality and design, and facilitating learning opportunities. The paper ends with a call for more research on several fronts to shape the future of crowd work: observational, experimental, design, and systems-related.

Reflection:

The distributed nature of crowd work theoretically allows anyone to do work from anywhere, at any time, and there are clear benefits to this freedom. On the other hand, this distributed nature also enforces existing power structures and facilitates the abstraction of human labor. This paper addresses some of these concerns with crowd work, and highlights the need for enabling on-the-job training and re-establishing career ladders. However, recent work has highlighted the long-term physical and psychological effects of doing crowd work [1,2]. For example, content moderators are often traumatized by the work that they do. Gray and Suri [3] also point out the need for a “commons” that provides a pool of shared resources for workers, along with a retainer model that values workers’ 24/7 availability. Yet, very few platforms do so, mostly due to weak labor laws. More work needs to be done investigating the broader, long-term and secondary effects of doing crowd work.

Second, the paper highlights the need for human creativity and thought in guiding AI, but states that crowd work is analogous to a processor. This is not entirely correct, since a processor always produces the same output for a given input. On the other hand, the same (or different) human may not. This poses the potential for human biases to be introduced into the work that they do. For example, Thebault-Spieker et al. found that crowd workers are biased in some regards [4], but not others [5]. More work needs to be done to understand the impact of introducing creative, insightful, and—most importantly—unique human thought “in the loop.”

Finally, there is a tension between how society values those who do complex work (such as engineers, plumbers, artists, etc.), and the constant push towards the taskification, or “Uberization” of complex work (Uber drivers, contractors on Thumbtack and UpWork, crowd workers, etc.), where work is broken down into the smallest possible unit to increase efficiency and decrease costs. What does it mean for work to be taskified? Who benefits, and who loses? How do we value microwork? Can we value microwork the same as “skilled” work?

Questions:

Seven years later, is this the type of work you would want your children to do?
How do we incorporate human creativity into ML systems, without also incorporating human biases?
How has crowd work changed since this paper first came out?

References:

[1] Roberts, Sarah T. Behind the screen: Content moderation in the shadows of social media. Yale University Press, 2019.

[2] Newton, Casey. Bodies in Seats: At Facebook’s Worst-Performing Content Moderation Site in North America, one contractor has died, and others say they fear for their lives. The Verge. June 19, 2019.

[3] Mary L. Gray and Siddharth Suri. Ghost Work.

[4] Jacob Thebault-Spieker, Daniel Kluver, Maximilian A. Klein, Aaron Halfaker, Brent Hecht, Loren Terveen, and Joseph A. Konstan 2017. Simulation Experiments on (the Absence of) Ratings Bias in Reputation Systems. Proceedings of the ACM on Human-Computer Interaction 1, CSCW: 101:1–101:25. https://doi.org/10.1145/3134736

[5] Jacob Thebault-Spieker, Loren G. Terveen, and Brent Hecht 2015. Avoiding the South Side and the Suburbs: The Geography of Mobile Crowdsourcing Markets. In Proceedings of the 18th Acm Conference on Computer Supported Cooperative Work & Social Computing (CSCW ’15), 265–275. https://doi.org/10.1145/2675133.2675278

01/22/20 – Sukrit Venkatagiri – Ghost Work

January 27, 2020January 27, 2020 Sukrit Venkatagiri Leave a comment

1/22/20, Week 1

Book: Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass
Authors: Mary L. Gray, Siddharth Suri
Chapters: Introduction, Chapter 1

Summary:
In this book, Mary Gray and Sid Suri introduce the concept of “ghost work,” work that is done by people as part of a computational workflow, such as Uber’s face recognition system for drivers, or Facebook’s content moderators, among others. They highlight how a lot of technological, AI-backed advances involve large amounts of human labor that is often unseen, and thus, undervalued. This is a continuation of a centuries-long tradition of employing people on an as-needed basis. In this case, APIs hire humans for their creative and innovative inputs when AI systems encounter edge cases. This human input is then used as training data to improve the AI, and in turn, obviates the need for human input—at least temporarily. When developers attempt to push the boundaries further, they again encounter the limitations of AI and hire human annotators. This constant cycle of obviation and contingent recruitment is called the paradox of automation’s last mile.

Using a combination of ethnographic research and large-scale surveys, the authors highlight the context in which workers do this precarious work on four different platforms: MTurk, UHRS, LeadGenius, and Amara.org, and in two different countries: India and the US. They find that most workers value the flexibility—both temporal and spatial—afforded by this work, and yet, must deal with the precarity associated with this flexibility and a lack of legal protection: fluctuating income, no benefits, and the erasure of any semblance of a career ladder. They also find that platforms differ in their treatment of workers. By design, MTurk and UHRS isolate workers and provide no way for grievances to be redressed. On the other end of the spectrum are Amara and LeadGenius, where workers are valued and given opportunities to connect with each other.

Reflection:
This book shifts the focus on gig work from the requester to the worker; from efficient and cheap to flexible and cruel. The authors do so through a mixed-methods approach, which I appreciate. The ethnographic work highlights the context in which workers do this work and their motives for doing so. It also points to the inability of labor laws to keep up with technological advances, and how technology companies attempt to—and often succeed in—devaluing and abstracting away human labor in exchange for higher profits and stock prices. It would be interesting to see what the long-term psychological, physical, and economic effects are of gig work: on individuals, communities, and societies. I believe this will only increase the wealth inequality that we see throughout the world, and lead to wage stagnation and a lower quality of life for low-income workers. For example, recent work has pointed out the trauma that Facebook content moderators face on a daily basis [1, 2]. In addition, work has shown that workers’ decisions shape public discourse on social media platforms, which is in stark contrast to the “value-neutral” stance that social media platforms portray [3].

Of particular note is the authors’ problematization of automation and how the target is ever-moving, and that there will always be a need for human labor. The question then is how do we value this human labor in the technology supply chain? In addition, the erasure of the career ladder means that workers find it difficult to increase their wages. This points to the need for finding alternative ways to increase workers’ wages, and to train workers on-the-job. What happens when automation improves, and there is no longer a need for humans with a certain set of skills, but only for those who are highly skilled? This, too, points to the need for providing workers continuous on-the-job training.

Questions:

How many of you have engaged in ghost work? [If you have ever completed a CAPTCHA/RECAPTCHA, you have.] How many of you would like to rely on ghost work for full-time employment? The average wage is about $2/hr on MTurk [4]. Do you think you would be able to survive on that?
What are the long-term psychological, physical, and financial effects of doing ghost work?
How do we value workers when algorithms intentionally abstract away their labor?
How can workers be trained on-the-job? What are alternatives to a career ladder?
What can you do to improve the condition of people who do ghost work?

References:
[1] Roberts, Sarah T. Behind the screen: Content moderation in the shadows of social media. Yale University Press, 2019.
[2] Newton, Casey. Bodies in Seats: At Facebook’s Worst-Performing Content Moderation Site in North America, one contractor has died, and others say they fear for their lives. The Verge. June 19, 2019. https://www.theverge.com/2019/6/19/18681845/facebook-moderator-interviews-video-trauma-ptsd-cognizant-tampa
[3] Gillespie, Tarleton. “Platforms are not intermediaries.” Georgetown Law Technology Review 2, no. 2 (2018): 198-216.
[4] Hara, Kotaro, Abigail Adams, Kristy Milland, Saiph Savage, Chris Callison-Burch, and Jeffrey P. Bigham. “A data-driven analysis of workers’ earnings on Amazon Mechanical Turk.” In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, pp. 1-14. 2018.