02/19/20 – Lulwah AlKulaib- OrderWikipedia

February 18, 2020 Lulwah AlKulaib 2 Comments

Summary

The paper examines the roles of software tools in English language Wikipedia. The authors shed light on the process of counter-vandalism in Wikipedia. They explain in detail how participants and their assisted editing tools review Wikipedia contributions and enforce standards. They show that the editing process in Wikipedia is not a disconnected activity where editors force their views on others. Specifically, vandal fighting is shown as a distributed cognition process where users come to know their projects and users who edit it in a way that is impossible for a single individual. The authors claim the blocking of a vandal a cognitive process made possible by a complex network of interactions between humans, encyclopedia articles, software systems, and databases. Humans and non-humans work to produce and maintain a social order in the collaborative production of an encyclopedia with hundreds of thousands of diverse and often unorganized contributors. The authors introduce trace ethnography as a method of studying the seemingly ad-hoc assemblage of editors, administrators, bots, assisted editing tools, and others who constitute Wikipedia’s vandal fighting network.

Reflection

The paper comes off as a survey paper. I found that the authors explained some methods that already existed and used one of the authors experience to elaborate on others’ work. I couldn’t see their contribution but maybe that was needed 10 years ago? The tools that they mentioned (Huggle, AIV, Twinkle, ..etc.) were standard tools to be used when editing Wikipedia’s articles and monitoring edits made by others. They reflected on how those tools were helpful in a manner that made fighting vandalism an easier task. They mention that these tools facilitate viewing each edited article by linking it with a detailed edit summary with an explanation why it was done, by whom, and related IP addresses. They explain how they use such software to detect vandalism and how to revert back to the correct version of the article. They presented a case study of a Wikipedia vandal and showed logs of the changes that he was able to make in an hour. The authors also referenced Ed hutchins who explains how cognitive work must be performed in order to keep US Navy ships on course at any given time. And how that is a similar reference to what it takes to manage Wikipedia. Technological actors in Wikipedia, such as Huggle, make what would be a difficult task into a mundane affair. Reverting edits becomes a matter of pressing a button. The paper was informative for someone who hasn’t worked on editing Wiki articles but I thought that this paper could have been presented as a tutorial, it would’ve been more beneficial.

Discussion

Have you worked on Wikipedia article editing before?
Did you encounter using the tools mentioned in the paper?
Is there any application that comes to mind where this can be used other than Wikipedia?
Do you think such tools could be beneficial when it comes to open source software version control?
How would this method generalize to open source software version control?

02/19/2020 – Nan LI – In Search of the Dream Team: Temporally Constrained Multi-Armed Bandits for Identifying Effective Team Structures

February 18, 2020February 18, 2020 NAN LI 1 Comment

Summary:

The paper pointed out the theory that there is no universally ideas structures for effective teamwork. The best structure for teamwork is determined by different team members, tasks, surroundings, etc. Thus, this paper presents a system that investigates the optimal team structure by adapting different team structures to teams and evaluate the efficiency based on team performance and teamwork feedback. However, the combination of diverse dimensions of team structures and the arms (the different values of dimension) of each dimension is a large set. To avoid overwhelming group testers with these values, the paper also leverages a model called multi-armed bandits with temporal constrains which set the constraints of the number of arms selections based on several factors. The paper tested the platform with AMT workers, and evaluate the performance of the system with designed task and performance evaluation. The system confirmed that there are no two teams had the same optimal team structures, and this structure even different for the same group when completing the different tasks. The results also indicate the platform DreamTeam can promote high-efficiency teamwork.

Reflection:

First, I highly agree with the opinion that there are no universal idea structures for effective teamwork. Besides, searching the optimal structure through adapt different dimensions and evaluating each fit seems also reasonable. However, I think the experiment could gather more valuable information if they test the platform with a real group instead of a randomly formed group. Because I think the premise of becoming a group and completing a task together is that the group members are familiar with each other. Thus, this platform should be more effective in the early stages of team formation. Before the team members are familiar with each other, they can use this system to find the optimal team structure temporarily, so that they can quickly cooperate and work as a team even though they do not know each other. Nevertheless, as the familiarity among the group members raises, using this method to determine the optimal structure may be inefficient. Because they may have already find the best structure for some of the dimensions as they get along and getting more experience of working together.

I also considered another situation that is more suitable for this system, when a long-established team is assigned to working on a new type of task. Then maybe the working mode of the teamwork needs to be switched so that they can complete the new task most efficiently. At this time, the system support is demanded to find this new optimal structure.

Finally, I think the constraints method mentioned in the article also very inspirational. Maybe we can improve the effectiveness of the DreamTeam platform by allowing users to pre-delete some dimensions that they would not like to change. For example, the hierarchy or interaction pattern. In this case, the reduction of the combination is more conducive to exhaustive testing, and the adapting structure should be more fit for the teamwork.

Question:

What do you think using the computation power to decide the optimal structure for teamwork?
In this paper, the author finds random tester to form a group and complete the task, do you think this will influence the results?
Under what condition do you think this platform would most benefit.

Word Count: 544

02/19/2020 – Human-Machine Collaboration for Content Regulation – Myles Frantz

February 18, 2020February 18, 2020 Miles Frantz Leave a comment

Since the dawn of the internet, it has surpassed many expectations and is prolific throughout everyday life. Though initially there was a lack of standards in website design and forum moderation, it has relatively stabilized with various and scientific approaches. A popular forum side, Reddit, use a human lead human-ai collaboration to help automatically and manually moderate the ever-growing comments and thread. Searching through the top 100 subreddits (at the time of writing), the team followed surveyed moderators from 5 varied and highly active subreddits. These moderators are majority Due to the easy to use API provided by Reddit, one of the most used moderation tools was a third party later incorporated into Reddit Automod. This is one of the more popular and common tools used by moderators in Reddit. Since it is very extensible, there is no common standard between all the subreddits. Moderators within the 5 subreddits use this bot in relatively similar but different ways. Not only the sole bot used by moderations, other bots can be used to further interact and streamline other bots in a similar fashion. However due to the complication of bots (technologically or lack of interest in learning the tool), some subreddits let a few people manage the bots, sometimes to damning results. When issues happen, instead of being reactive to various users’ reactions, the paper argues for more transparency of the bot.

I agree with the author of the original of automod, when he started off making the bot purely to automate several steps. Continuing this forward with the scaling of Reddit, I do believe it would be impossible for only human moderators to keep up with the “trolls”.

Though I do disagree with how the rules of the automod are spread out. I would believe the decentralization of knowledge would make the system more robust, especially since the moderators are voluntary. It is natural for people to avoid what they don’t understand, for fear of it in general or fear for what repercussions may happen. Though I don’t think putting all of the work on one moderator is necessarily the right answer.

One of my questions is regarding one of the outcomes for Reddit; granting more visibility of the automods actions. Notably due to the scale of Reddit, extending this kind of functionality automatically could incur much more of a memory and storage overhead. Already Reddit stores vast amount of data however potentially doubling the memory capacity (if every comment was reviewed by automod) may be a downfall to this approach.
Instead of surveying the top 60%, I wonder if surveying the lower ranked (via RedditMetrics) subreddit with a lower number of moderators would fit the same pattern of the automod use. I would imagine they would be forced to use the automod tool more in depth and in breadth due to the lack of available resources however this is pure speculation.
A final question would be, to what percentage is there an over duplication of bots across the subreddits? If there is a big percentage it may lead to a vastly different experience across subreddits, as it seemingly is now potentially causing confusion amongst new or recurring users.

02/19/2020 – Updates in Human-AI Teams: Understanding and Addressing the Performance / Compatibility Trade off – Yuhang Liu

February 18, 2020February 18, 2020 yuhang Liu 1 Comment

This paper first proposes the complementarity between humans and artificial intelligence. In many cases, humans and artificial intelligence will form a team. When people make decisions after checking the inferences of AI, this cooperation model has applications in many fields, and achieved significant results. Usually, this kind of achievement requires certain prerequisites. First, people must have their own judgments on the conclusions of artificial intelligence. At the same time, the results of artificial intelligence must be accurate. The tacit cooperation between the two can improve efficiency. However, with the updating of artificial intelligence systems and the expansion of data, this cooperation will be broken. On the one hand, the accuracy of artificial intelligence will decline, and because of the expansion of boundaries, people’s understanding of artificial intelligence will be broken. So after the system update, the efficiency will be reduced instead. This paper mainly studies this situation. The article hopes to be compatible with the previous method after the update, so several methods are proposed to achieve this purpose, so as to achieve more compatible and accurate updates.

It is also suggested that this idea is obtained by analogy. In software engineering, if the updated system can support legacy software, it will be compatible after the update. I agree with this kind of analogy greatly, which is similar to bionics. We can continuously apply new ideas to the computer field through this kind of thought. The method mentioned in this paper is also very necessary. In the ordinary process of artificial intelligence or machine learning, we usually build data sets for each time, and lack the concept of inheritance, which is very inconvenient. After adopting compatible ideas, it will greatly save energy and be able to serve people more smoothly.

This article introduces CAJA, a platform for measuring the impact of AI performance and updates on team performance. At the same time, a practical retraining goal is introduced in the article to improve update compatibility. The main idea is to improve update compatibility by punishing new errors. But it can also be seen from the text that trust is the core of team work. Admittedly, trust is the essence of a team, but only as the basis of work, I think that more simulations and improvements are needed to improve humanity. The combination of problem-solving factors and the key of machine learning, we know that after learning new things, people will not have a negative impact on previous skills, but we will have more perspectives and methods to think about a problem, so I think that humans and machines should be mixed, that is, a team as a whole, so that the results can be more compatible, and the human machine interaction can be more successful.

question：

What are the implications of compatible AI updates?
How to better treat people and machines as a whole?
Whether compatible AI will affect the final training results?

2/19/20 – Lee Lisle – Updates in Human-AI Teams: Understanding and Addressing the Performance/Compatibility Tradeoff

February 18, 2020February 18, 2020 Lorance R Lisle 2 Comments

Summary

Bansal et. al discuss how human-AI teams work in solving high-stakes issues such as hospital patient discharging scenarios or credit risk assessment. They point out that the humans in these teams often create a mental model of the AI suggestions, where the mental model is an understanding of when the AI is likely wrong about the outcome. The authors then show that updates to the AI can produce worse performance if they are not compatible with the already formed mental model of the human user. They go on to define types of compatibility for AI updates, as well as a few other key terms relating to human/AI teams. They develop a platform to measure how compatibility can affect team performance, and then measure AI update compatibility effectiveness through a user study using 25 mTurk workers. In all, they show that incompatible updates reduce performance as compared to no update at all.

Personal Reflection

The paper was an interesting study in the effect of pushing updates without considering the user involved in the process. I hadn’t thought of the human as an exactly equal player in the team, where the AI likely has more information and could provide a better suggestion. However, it makes sense that the human leverages other sources of information and forms a better understanding of what choice to ultimately make.

CAJA, the human/AI simulation platform, seems like a good way to test AI updates, however I struggle to see how it can be used to test other theories as the authors seem to suggest. It is, essentially, a simple user-learning game, where users figure out when to trust the machine and when to deviate. While this isn’t exactly my field of expertise, I only see the chance to change information flows and the underlying AI as ways of learning new things about human/AI collaboration. This would mean terming this as a platform is a little excessive.

Questions

The authors mention that, in order to defeat mTurk scammers who click through projects like these quickly, they drop the lowest quartile (in terms of performance) out of their results. Do you think this was an effective countermeasure, or could the authors be cutting good data?
From other sources, such as Weapons of Math Destruction, we can read how some AI suggestions are inherently biased (even racist) due to input data. How might this change the authors results? Do you think this is taken into consideration at all?
One suggestion near the end of the paper stated that, if pushing an incompatible update, the authors of the AI should make the change explicit so that the user could adjust accordingly. Do you think this is an acceptable tradeoff to not creating a compatible update? Why or why not?
The authors note that, as the complexity of error boundary f became more complex, errors increased, so they kept to relatively simple boundaries. Is this an effective choice for this system, considering real systems are extremely complex? Why or why not?
The authors state that they wanted the “compute” cost to be net 0. Does this effectively simulate real-world experiences? Is the opportunity-cost the only net negative here?

2/19/20 – Lee Lisle – The Work of Sustaining Order in Wikipedia: The Banning of a Vandal

February 18, 2020February 18, 2020 Lorance R Lisle 1 Comment

Summary

Geiger and Ribes cover the case of using automated tools or “bots” in order to prevent vandalism on the popular online and user-generated encyclopedia “Wikipedia.” The authors detail how editors use popular distributed cognition coordination services such as “Huggle,” and argue that these coordination applications affect the creation and maintenance of wikipedia as much as the traditional social roles of editors. The team of human and AI work together to fight vandalism in the form of rogue edits. They cover how bots assisted essentially 0% of edits in 2006 to 12% in 2009, while editors use even more bot assistance. They then deep dive into how the editors came to ban a single vandal that committed 20 false edits to Wikipedia in an hour, which they term a “trace ethnography.”

Personal Reflection

This work was eye-opening in seeing exactly how Wikipedia editors leverage bots and other distributed cognition to maintain order in Wikipedia. Furthermore, after reading this, I am much more confident in the accuracy of articles contained on the website (possibly to the chagrin of teachers everywhere). I was surprised how easily attack edits were repelled by the Wikipedia editors, considering that hostile bot networks could be deployed against Wikipedia as well.

I also generally enjoyed the analogy of how managing Wikipedia is like navigating a naval vessel in that both leverage significant amounts of distributed cognition in order to succeed. Showing how many roles are needed in order to understand various jobs and collaborate between people was quite effective.

Lastly, their focus (trace ethnography) on a single vandal was an effective way of portraying what is essentially daily life for these maintainers. I was somewhat surprised that only four people were involved before banning a user; I had figured that each vandal took much longer to identify and remedy. How the process proceeded, where the vandal got repeated warnings before a (temporary) ban occurred, and how the bots and humans worked together in order to come to this conclusion, was a fascinating process that I hadn’t seen written in a paper before.

Questions

One bot that this article didn’t look into is a twitter bot that tracked all changes on Wikipedia made by IP addresses used by congressional members (@CongressEdits). Its audience is not specifically intended to be the editors of Wikipedia, but how might this help them? How does this bot help the general public? (It has since been banned in 2018) How might a tool like this be abused?
How might a trace ethnography be used in other applications for HCI? Does this approach make sense for domains other than global editors?
How can Huggle (or the other tools) be changed in order to tackle a different application, such as version control? Would it be better than current tools?
Is there a way to exploit this system for vandals? That is, are there any weaknesses to human/bot collaboration in this case?

02/19/2020 – Nan LI – Updates in Human-AI Teams: Understanding and Addressing the Performance/Compatibility Tradeoff

February 18, 2020February 18, 2020 NAN LI 2 Comments

Summary:

In this paper, the author presented that it is prevailing nowadays human and AI form a team to make decisions in a way that AI provides recommendations and humans decide to trust or not. In these cases, successful team collaboration mainly based on human knowledge on previous AI system performance. Therefore, the author proposed that though the update of the AI system would enhance the AI predictive precision, it will hurt the team performance since the updated version of the AI system usually not compatible with the mental model that humans developed with previous AI systems. To solve this problem, the author introduced the concept of compatibility of AI update with prior user experience. To examine the role of this compatibility in Human-AI teams, the author proposed methods and designed a platform called CAJA to measure the impact of updates on team performance. The outcomes show that team performance could be harmed even the updated system’s predictive accuracy improved. Finally, the paper proposed a re-training objective that can promote the compatibility of updates. In conclusion, to avoid diminish team performance, the developer should build a more compatible update without surrendering performance.

Reflection:

In this paper, the author talked about a specific interaction, which is AI-advised human decision making. As the example presented in the paper–Patient readmission system. In these cases, an incompatible update of the AI system would indeed harm the team performance. However, I think the extent of the impact largely depends on the correlation between the human and AI systems.

If the system and the user have a high grade of interdependence, both are not specialists on a task, the system prediction accuracy and user knowledge have the same impact on the decision result, the incompatible update of the AI system will weaken the team performance. Even though this effect can be eliminated by the running-in of the user and the system later, the cost for the decision in the high-stakes domain will be very large.

However, if the system interacts with users frequently, but the system’s predictions are only one of the concerns for humans to make decisions and cannot directly affect the decision, then the impact of incompatible updates on team performance will be limited.

Besides, if humans are more expertise on the task, and can validate the correctness of the recommendation promptly, then both the improvement of the system performance and the new errors caused by the system update will not have much impact on the results. On the other hand, if the error caused by the update does not affect team performance, then when updating the system, we do not need to consider compatibility but only need to consider the improvement of system performance. As a conclusion, if there is not enough interaction between the user and the system, and the degree of interdependence is not high, or the system only serves as an auxiliary or double-check, then the system update will not have a great impact on team performance.

A compatible update is indeed helpful for users to quickly adapt to the new system, but I think the impact of update largely depends on the correlation between the user and the system, or the proportion of the system’s role in teamwork.

Besides, design a compatible update version also requires extra cost. Therefore, I think we should consider minimizing the impact of system errors on the decision-making process when designing the system and establishing human-AI interaction.

Question:

What do you think about the concept of compatibility of AI update?
Do you have any human-AI system examples that apply this author theory?
Under what circumstances do you think the author’s theory is the most used and when it is not applicable?
When we need to update the system frequently, do you think it is better to build a compatible update or to use an alternative method to solve the human adaptation costs?
In my opinion, Huaman’s degree of adaptation is very high, and the cost required for humans to adapt is much smaller than the cost of developing a compatible update version. what do you think?

Word Count: 682

02/19/20 – Fanglan Chen – In Search of the Dream Team: Temporally Constrained Multi-Armed Bandits for Identifying Effective Team Structures

February 18, 2020 Fanglan Chen 1 Comment

Summary

Zhou et al.’s paper “In Search of the Dream Team” introduces DreamTeam — a system that identifies effective team structures for each group of individuals by suggesting different structures and evaluating the fit of each team. How team works relates with team structures, including roles, norms, and interaction patterns. Prior organizational behavior research doubts the existence of universally perfect structures. The rationale is simple: teams broast of great diversity so one single structure cannot satisfy the full functioning of each team. The proposed DreamTeam explores values along five dimensions of team structures including hierarchy, interaction patterns, norms of engagement, decision-making norms, and feedback norms. The system leverages feedback, randomly choosing metrics such as team performance or satisfaction, to iteratively identify the team structures that facilitate the best organization of each team. Also, the authors design multi-armed bandits with temporal constraints, an algorithm that determines the timing of exploration and exploitation trade-offs across multiple dimensions to avoid overwhelming teams with too many changes. In the experiments, DreamTeam is integrated with the chat platform Slack and achieves better performance and more diverse team structures compared with baseline methods.

Reflection

The authors design a system to facilitate the organization of virtual teams. Along with the several limitations mentioned in the paper, I feel the proposed DreamTeam system is based on a comparatively narrow scope of what makes a dream team and it seems difficult to generalize the framework to a variety of domains or platforms.

In the first place, I do not agree that there is a universal approach to design or evaluate a so-called dream team. The components that make a dream team vary in different domains. For example, in sports, I would say personality and talent play important roles in forming a dream team. Actually, it goes beyond the term “forming” that a bunch of talented individuals not only bring technical expertise to the team, but they also contribute passion, strong work ethic, and strive for peak performance in the pursuit of excellence. To extend that point, working with people having similar personalities, similar values, similar pursuits will bring some chemistry to the team work which potentially enables challenging problem solving and strategic planning. All of these are not mentioned in the demensions and nearly impossible to be evaluated quantitatively.

Also, I think it is important to make every team member understand their role, such as why they need to tackle the tasks and how that ties to a larger purpose beyond self’s needs. This provides a clear purpose and direction of where a group of people need to move forwards as a team. I do not think the authors emphasize the importance of how such understanding influences team member level of commitment. In addition, this kind of unified purpose can avoid duplication of member efforts and prevents pulling the efforts in multiple directions.

Last but not least, in my opinion, basing on the maximizing of rewards is not the ideal way to determine the best team structures. Human society treasure process as well as results. It can be seen as a successful teamwork as long as the whole team is motivated and working on it. If too much emphasis is put on results, then the joy will be drained out of the job for the team. As long as progressive steps are made towards achieving the goal within a reasonable time frame, the team will become better. Building an ambitious, driven and passionate team is just the start. We need to ensure that the team members survive and are nurtured so that they can deliver on the targets.

Discussion

I think the following questions are worthy of further discussion.

If you are the CEO of a company or a university president, would you consider using the proposed DreamTeam system to help organize your own team? Why or why not?
Do you think the five bandits capture all dimensions to make a dream team?
Do you think the proposed DreamTeam system can be generalized to various domains? Are there any domains you think the system would not contribute towards an efficient team structure?
Is there anything you can think about to improve the proposed DreamTeam system?

02/19/2020 – Vikram Mohanty – The Work of Sustaining Order in Wikipedia: The Banning of a Vandal

February 18, 2020 Vikram Mohanty 2 Comments

Summary

This paper, through a case study, highlights the invisible distributed cognition process that goes on underneath a collaborative environment like Wikipedia, and how different actors, both humans and non-humans, come together for achieving a common goal – banning a vandal on Wikipedia. The authors show the usefulness of trace ethnography as a method for reconstructing user actions and understanding better the role each actor plays in the larger scheme of things. The paper advocates for not dismissing the role of bots as mere force multipliers, but to see them in a different lens considering the wide impact they have.

Reflection

Similar to the “Human-Machine Collaboration for Content Regulation: The Case of Reddit Automoderator” paper, this paper is a great example that intelligent agents (AI-infused systems, bots, scripts, etc.) should not be studied in isolation only, but through a socio-technical lens. In my opinion, that provides a more comprehensive picture of the goals these agents can and cannot achieve, the collaboration processes they may inevitably transform, the human roles they may affect and other unintended consequences than performance/accuracy metrics alone.

Trace ethnography is a powerful method for reconstructing user actions in a distributed environment, and using that to understand how multiple actors (human and non-humans) achieve a complex objective, by sub-consciously collaborating with each other. The paper advocates that bots/automation/intelligent agents should not be seen as just force multipliers or irrelevant users. This is important as a lot of current evaluation metrics focus only on quantitative measures such as performance or accuracy. This paints an incomplete, and sometimes, an irresponsible picture of intelligent agents, as they have now evolved to assume an irreplaceable role in the larger scheme of things (or goals).

The final decision-making privilege resides with the human administrator and the whole socio-technical pipeline assists each step of decision-making with all possible information available so that checks and bounds (or order, as the paper mentions) is maintained at every stage. Automated decisions, whenever taken, are grounded in some confidence of certainty. In my opinion, while building AI models, researchers should think about the AI-infused system or the real-world setting of which these algorithms would be a part of. This might motivate researchers to make these algorithms more transparent or interpretable. The lens of the user who is going to wield these models/algorithms might help further.

It’s interesting to see some of the principles of mixed-initiative systems being used here i.e. history of the vandal’s actions, templated messages, showing statuses, etc.

Questions

Do you plan to use trace ethnography in your proposed project? If so, how? Why do you think it’s going to make a difference?
What are some of the risks and benefits of employing a fully automated pipeline in this particular case study i.e. banning a Wikipedia vandal?
A democratic online platform like Wikipedia supports the notion of anyone coming in and making changes, and thus necessitates deploying moderation workflows to curb bad actors. However, if a platform were restrictive to some degrees, a post-hoc setup may not be necessary and the platform might be less toxic. This does not necessarily be the case for Wikipedia and can also extend to SNS like Twitter/Facebook, etc. What would you prefer, a democratic one or a restrictive one?

02/19/2020-Bipasha Banerjee -In Search of the Dream Team: Temporally Constrained Multi-Armed Bandits for Identifying Effective Team Structures

February 18, 2020 bipashab 1 Comment

Summary:

The paper aims to find a Dream Team by adopting teams to different structures and subsequent evaluation. The authors try to identify the ideal team structure using “the multi-armed bandit” approach over time. The dream team structure selects the next exploration task based on the reward from the previous job. They explored a lot of background research on HCI groups, the structural contingency theory from organizational behavior, multi-armed bandit. A network of five bandits was created with different dimensions, namely, hierarchy, interaction patterns, norms of engagement, decision-making norms, and feedback norms. Each of the dimensions has different possible values. For example, for hierarchy, there can be three possible values – none, centralized (where a leader was elected), decentralized (majority vote). Global temporal constraint and dimensional temporal constraint are taken into consideration to determine at what stage the teams are prepared to embrace changes and also take into account if too many dimensions change at one. The authors used the popular game Codenames for the Slack interface. They used Amazon Mechanical Turk to employ 135 workers and assigned them based on five conditions, namely, control, collectively chosen, manager chosen, bandit chosen, and Dream Team is chosen. There were 35 teams with seven teams per condition. It was found that Dream Team based teams outperformed other teams

Reflection

The paper was a nice read on selecting the ideal team structure to maximize productivity. The paper did extensive background research on team structures and included theories from HCI and organizational behavior. Being from a CS background, I have no idea about what team structure is and the theory involved behind selecting the ideal structure. It was a very new concept for me, and the difference between the approaches taken by the HCI domain and Organizational behavior was intriguing. The authors described their approach in detail and mathematically, which makes it easy to visualize the problem as well as the method.

The most interesting section was the integration with Slack, where the Slack bot was utilized to guide the Team with broadcast messages. It was interesting to see how different teams reacted to the messages the Slack bot posted. Dream Teams mostly adhered to the suggestions of the Slack bot whereas, some of the other team structures chose to ignore them. It would be good if the evaluation is also done on a different task. The game is relatively simple, and we don’t know how the Dream Team structure would perform for complicated tasks. It would be intriguing to see how this work could be potentially extended.

The paper highlights a probabilistic approach to proposing the ideal team structure. One thing that was not very clear to me is how the slack bots suggest other than taking into consideration the current score and the best approach. Is it using NLP techniques to deduce the sentiment of the comment and then posting a cross-comment?

Question

The authors used slack to test their hypothesis. How would dream-team perform for real-life software development teams?
The test subjects were Amazon Mechanical Turks, and the project was reasonably simple (codenames game). Would Dream Team performs better than the other structures when it is domain-specific, and experts are involved? Would it lead to more conflicts?
Could we use better NLP techniques and sentiment analysis to guide the DreamTeams better?