04/08/20 – Fanglan Chen – The State of the Art in Integrating Machine Learning into Visual Analytics

April 8, 2020 Fanglan Chen Leave a comment

Summary

Endert et al.’s paper “The State of the Art in Integrating Machine Learning into Visual Analytics” surveys the recent state-of-the-art models that integrate machine learning to visual analytics and highlights the advances achieved at the intersection of machine learning and visual analytics. In the data-driven era, how to make sense of data and how to facilitate a wider understanding of data attract the interests of researchers in various domains. It is challenging to discover knowledge from data while delivering reliable and interpretable results. Previous studies suggest that machine learning and visual analytics have complementary strengths and weaknesses, there are many works that explore the possibility to combine those two to develop interactive data visualization to promote sensemaking and analytical reasoning. This paper presents a survey of the achievements that have been made by recent state-of-the-art models. Also, it provides a summary of opportunities and challenges to boost the synergy between machine learning and visual analytics as future research directions.

Reflection

Overall, this paper presents a thorough survey of the progress that has been made by highlighting and synthesizing select research advances. The recent advances of deep learning models bring more new challenges and opportunities in the intersection of machine learning and visual analytics. We need to be aware the design of a highly accurate and efficient deep learning model is an iterative and progressive process of training, evaluation, and refinement, which typically relies on a time-consuming trial-and-error procedure where the parameters and the model structures are adjusted based on user expertise. Visualization researchers are making initial attempts to visually illustrate intuitive model behaviors and debug the training processes of widely-used deep learning models such as CNNs and RNNs. However, little effort has been conducted in tightly integrating state-of-the-art deep learning models with interactive visualizations to maximize the value of both. There is full potential in integrating deep learning into visual analytics for a better understanding of current practices.

As we know, the training of deep learning models requires a lot of data. However, sometimes well-labeled data is very expensive to obtain. The injection of a small number of user inputs into the models can potentially solve these problems through a visual analytics system. In real-world applications, a method is impractical if each specific task requires its own separate large-scale collection of training examples. To close the gap between academic research outputs and real-world requirements, it is necessary to reduce the sizes of required training sets by leveraging prior knowledge obtained from previously trained models in similar categories, as well as domain experts. Few-shot learning and zero-shot learning are two of the unsolved problems in the current practice of training deep learning models, which provide a possibility to incorporate prior knowledge on objects into a “prior” probability density function. That is, those models trained using given data and their labels can usually solve only pre-defined problems for which they were originally trained.

Discussion

I think the following questions are worthy of further discussion.

What other challenges or opportunities can you think about a framework to incorporate machine learning and visual analytics?
How to best leverage the advantages of machine learning and visual analytics in a complementary way?
Do you plan to utilize a framework to incorporate machine learning and visual analytics in your course project? If yes, how do you plan to approach it?
Are there any applications that we access in daily life you can think of as good examples that integrate machine learning into visual analytics?

04/08/20 – Fanglan Chen – CrowdScape: Interactively Visualizing User Behavior and Output

April 8, 2020 Fanglan Chen 4 Comments

Summary

Rzeszotarski and Kittur’s paper “CrowdScape: Interactively Visualizing User Behavior and Output” explores the research question of how to unify different quality control approaches to enhance the quality control of the work conducted by crowd workers. With the emerging crowdsourcing platforms, many tasks can be accomplished quickly by recruiting crowd workers to collaborate in a parallel manner. However, quality control is among the challenges faced by crowdsourcing paradigm. Previous works focus on designing algorithmic quality control approaches based on either worker outputs or worker behavior, but neither of the approaches is effective for complex or creative tasks. To fill in that research gap, the authors develop CrowdScape, a system that leverages interactive visualization and mixed initiative machine learning to support the human evaluation of complex crowd work. With experimentation on a variety of tasks, the authors present the incorporation of information about worker outputs and worker behavior with worker outputs has the potential to assist users to better understand the crowd and further identify reliable workers and outputs.

Reflection

This paper conducts an interesting study by exploring the relationship between the outputs and behavior patterns of crowd workers to achieve better quality control in complex and creative tasks. The proposed method provides a unique angle of quality control and it has wide potential usage in crowdsourced tasks. Although there is a strong relationship between the final outputs and the behavior patterns, I feel the design of CrowdScape relies too heavily on the behavior analysis of crowd workers. In many situations, good behavior can lead to good results, but that cannot be guaranteed. From my understanding, behavior analysis is more suitable to be utilized as a screening mechanism in certain tasks. For example, in the video tagging task, workers that have gone through the whole video are more possible to provide accurate tagging. In this case, behavior such as watching is more like a necessary condition, not a sufficient condition. The group of workers who finish watching the videos may still reach disagreement on the tagging output. A different quality control mechanism is still needed in this round. In creative and open tasks, the behavior patterns are even more difficult to capture. By analyzing the behavior by metrics such as time spent on the task, we cannot directly connect the analysis with the measurement of creativity.

In addition, I think the quality of a crowdsourced task discussed in paper is comparatively narrow. We need to be aware the quality control on crowdsourcing platforms is multifaceted, which depends on the knowledge of the workers on the specific task, the quality of the processes that govern the creation of tasks, the recruiting process of workers, the coordination of subtasks such as reviewing intermediate outputs, aggregating individual contributions, and so forth. A more comprehensive quality control circle needs to take the following aspects into consideration: (1) quality model that clearly defines the dimensions and attributes to control quality in crowdsourcing tasks. (2) assessment metrics which can be utilized to measure the values of the attributes identified by the quality model (3) assurance of quality, which requires a set of actions that aim to achieve expected levels of quality. To prevent low quality, it is important to understand how to design for quality and how to intervene if quality drops below expectations on crowdsourcing platforms.

Discussion

I think the following questions are worthy of further discussion.

Can you think about some other tasks that may benefit from the proposed quality control method?
Do you think the proposed method can perform good quality control on the complex and creative tasks as the paper suggested? Why or why not?
Do you think that an analysis based on worker behavior can assist to determine the quality of the work conducted by crowd workers? Why or why not?
What scenarios do you think would be more useful to trace worker behavior, informing them beforehand or tracing without advance notice? Can you think about some potential ethical issues?

04/08/2020 – Vikram Mohanty – CrowdScape: Interactively Visualizing User Behavior and Output

April 8, 2020 Vikram Mohanty 1 Comment

Authors: Jeffrey M Rzeszotarski, Aniket Kittur

Summary

This paper proposes a system CrowdScape, that supports human evaluation of crowd work through interactive visualization of behavioral traces and worker output, combined with mixed-initiative machine learning. Different case studies are discussed to showcase the utility of CrowdScape.

Reflection

The paper addresses the issue of quality control, a long-standing problem in crowdsourcing, by combining two existing standalone approaches that researchers currently adopt: a) inference from worker behavior and b) analyzing worker output. Combining these factors is advantageous as it provides the complete picture, either by providing corroborating evidence towards ideal workers, or in some cases, may provide complementary evidence that can help infer ideal “good” workers. Just analyzing the worker output might not be enough as there’s an underlying chance that it might be as good as a random coin toss.

Even though it was a short text in parentheses, I really liked the fact that the authors explicitly sought permission to record the worker interaction logs.

Extrapolating other similar or dissimilar behavior using Machine Learning seems intuitive here as the data and the features used (i.e. the building blocks) of the model are very meaningful, perfectly relevant to the task and not a black-box model. As a result, it’s not surprising to see it work almost everywhere. The one case where it didn’t work, it made up for it by showing that the complementary case works. This sets a great example for designing predictive models on top of behavioral traces that actually works.

Moreover, the whole system was built agnostic of the task, and the evaluations justified it. However, I am not sure if the best use case of the system is optimized towards recruiting multiple workers for a single task, or whether it is to identify a set of good workers to subsequently retain for other tasks in the pipeline. I am guessing it is the latter, as the former might seem like an expensive approach for getting high-quality responses.

On the other hand, I feel the implications of this paper go beyond just crowdsourcing quality control. CrowdScape, or a similar system, can provide assistance for studying user behavior/experience in any interface (web for now), which is important for evaluating interfaces.

Questions

Does your evaluation include collecting behavioral trace logs? If so, what are some of your hypotheses regarding user behavior?
How do you plan on assessing quality control?
What kind of tasks do you see CrowdScape being best applicable for? (e.g. single task, multiple workers)

04/08/2020 – Vikram Mohanty – Agency plus automation: Designing artificial intelligence into interactive systems

April 8, 2020 Vikram Mohanty 2 Comments

Authors: Jeffrey Heer

Summary

The paper discusses interactive systems in three different areas — data wrangling, exploratory analysis, and natural language translation — to showcase the use of “shared representation” of tasks, where machines can augment human capabilities instead of replacing them. All the systems highlight balancing of the complementary strengths and weaknesses of each, while promoting human control.

Reflection

This paper makes the case for intelligence augmentation i.e. augmenting human capabilities with the strengths of AI rather than striving to replace them. Developers of intelligent user interfaces can come up with effective collaborative systems by carefully designing the interface for ensuring that that AI component “reshapes” the shared representations that users can contribute to, and not “replace” them. This is always a complex task, and therefore, requires scoping down from the notion that AI can be used to automate everything by focusing on these editable shared representations. This has other benefits i.e. helps exploit the benefits of AI in a sum-of-parts manner rather than an end-to-end mechanism where an AI is more likely to be erroneous. The paper discusses three different case studies where a mixed-initiative deployment was successful in catering to user expectations in terms of experience and output.

It was particularly interesting to see the participants complaining that the Voyager system, despite being good, spoilt them as it made them think less. This can hamper adoption of such systems. A reasonable design implication here should be allowing users to choose the features they want or giving them the agency to adjust the degree of automation/suggestions. This also suggests the importance of conducting longitudinal studies to understand how users use the different features of an interface i.e. whether they use one but not the other.

According to some prior work, machine-suggested recommendations have been known to perpetrate filter bubbles. In other words, users are exposed to a similar set of items and miss out on other stuff. Here, the Voyager recommendations work in contrast to prior work by allowing users to explore the space, analyze different charts and data points they wouldn’t otherwise notice and combat confirmation bias. In other words, the system does what it claims to do i.e. augment the capabilities of humans in a positive sense using the strengths of the machine.

Questions

In the projects you are proposing for the class, does the AI component augment the human capabilities or strive to replace it (eventually)? If so, how?
How do you think developers should cater to cases where users are less likely to adopt a system because it impedes their creativity?
Do you think AI components (should) allow users to explore the space more than they would normally? Any possible pitfalls (information overdose, unnatural tasks/interactions, etc.)

4/8/20 – Akshita Jha – Agency plus automation: Designing artificial intelligence into interactive systems

April 8, 2020 Akshita Jha 2 Comments

Summary:
“Agency plus automation: Designing artificial intelligence into interactive systems” by Heer talks about the drawback of using artificial intelligence techniques for automating tasks, especially the ones that are considered repetitive and monotonous. However, this presents a monumentally optimistic point of view by completely ignoring the ghost work or the invisible labor that goes into making ‘automating’ these tasks. This gap between crowd work and machine automation highlights the need for design and engineering interventions. The authors of this paper try to make use of the complementary nature strengths and weaknesses of the two – creativity, intelligence, world-knowledge of the crowd workers and the cheap and no cognitive overload provided by automated systems. The authors describe in detail the case studies of interactive systems in three different areas – data wrangling, exploratory analysis, and natural language translation. These systems combine computational support with interactive systems. The authors also talk about sharing representations of tasks to include both human intelligence and automated support in the design itself. The authors conclude that “neither automated suggestions nor direct manipulation plays a strictly dominant role” and ” a fluent interleaving of both modalities can enable more productive, yet flexible, work.”

Reflections:
There is a lot of invisible work that goes into automating a task. Most automated tasks require hundreds, if not thousands, of annotations. Machine learning researchers turn a blind eye to all the effort that goes into annotations by calling their systems ‘fully automated’. This view is exclusionary and does not do justice to the vital but seemingly trivial work done by the crowd workers. One of the areas that one can focus on is the open question of shared representation – Is it possible to integrate data representation with human intelligence? If yes, is that useful? Data representation often involves the construction of latent space to reduce the dimensionality of input data and get concise and meaningful information. There may or may not be such representations exist for human intelligence. Maybe borrowing from social psychology might help in such a scenario. There can be other ways to go around this. For example, the authors focus on building interactive systems with ‘collaborative’ interfaces. The three interaction models: Wrangler, Voyager, and PTM do not distribute the tasks equally between humans and automated systems. The automated methods prompt the users with different suggestions which the end user reviews. The final decision making power lies with the end user. It would be interesting to see what would the results looks like if the roles were reversed and the system was turned on its head. An interesting case study could be if the suggestion was given by the end user and the ultimate decision making capability rested with the system. Would the system still be as collaborative? What would the drawbacks of such systems be?

Questions:

1. What are your general thoughts on the paper?
2. What did you think about the case studies? Which other case studies would you include?
3. What are your thoughts on evaluating systems with shared representations? Which evaluation criteria can we use?

4/8/20 – Akshita Jha – CrowdScape: Interactively Visualizing User Behavior and Output

April 8, 2020 Akshita Jha 1 Comment

Summary:
“CrowdScape: Interactively Visualizing User Behavior and Output” by Rzeszotarski and Kittur talks about crowdsourcing and the importance of interactive visualization using the complementary strengths and weaknesses of crowd workers and machine intelligence. Crowd sourcing helps work distribution. Quality control approaches for this are often not scalable. Crowd organizing algorithms like Partition-Map-Reduce, Find-Fix-Verify, and Price-Divide-Solve are used for easy distribution, merging and checking the work in crowd sourcing. However, they aren’t very accurate or useful in complex subjective tasks. CrowdScape assimilates worker behavior with worker input using interaction, visualization, and machine learning. This supports the human evaluation of crowd work. CrowdScape enables the user to hypothesize about and test the crowd to distill the selections by using a sensemaking loop. This paper proposes novel techniques for crowd worker’s product exploration and visualizations for crowd worker behavior. It also provides tools for classification or crowd workers and an interface for interactive exploration of these results using mixed-method machine learning.

Reflections:
There has been work done involving crowd behaviour centered on worker behaviour or worker output in isolation but combining them is very fruitful to generate mental models of the workers and build a feedback loop. Visualisation of the workers’ process helps us understand their cognitive process and thus perceive the end product better. CrowdScape can only be used in webpages online that allow the injection of JavaScript. It is not useful when this is blocked or for non-web offline interfaces. The set of aggregate features used might not always provide useful feedback. The already existing quality control measures are not very different from CrowdScape in case of clear, consensus ground truth exists, such as identifying a spelling error. In such cases, the effort put in learning and using CrowdSpace may not always be beneficial and hence may not be too advantageous. In some cases, the behavioral traces of the worker may not be very indicative. Such as when they work on a different editor and finally copy and paste the work in another one. Tasks that are heavily cognitive or totally offline are also not very compliant with the general methods supported by CrowdScape. This system heavily relies on the detailed level of behavioral traces such as mouse movement, scrolling, keypresses, focus events, and clicks. It should be ensured that this intrusiveness and the implied decrease in efficiency should be countered by the accuracy of the measurement of the behavior. An interesting point to note here is that this tool can become privacy-intrusive if care is not taken. We should ensure that changes are made to the tool as crowd work becomes increasingly relevant and the tool becomes vital to better understand the underlying data and crowd behaviour. Apart from these reflections, I would just like to point out that the graphs that the authors use in the paper help in conveying their results really well. I feel this is one detail that is vital but easily overlooked in most papers.

Questions:
1. What are your general thoughts about this paper?
2. Do you agree with the methodology followed?
3. Do you approve of the interface? Would you make any changes to the interface?

04/08/2020 – Bipasha Banerjee – CrowdScape: Interactively Visualizing User Behavior and Output

April 8, 2020 bipashab 1 Comment

Summary

The paper focuses on tackling the problem of quality control of the work done by crowdworkers. They created a system named CrowdScape to evaluate the work done by humans through mixed-initiative machine learning and interactive visualization. The provided details of quality control in crowdsourcing. This involved mentioning various methods that help evaluate the content. Some methods mentioned were post-hoc output evaluations, behavioral traces, and integrated quality control. CrowdScape is a system developed by the authors to capture worker behavior, also in the form of interactive data visualizations. The system incorporates various techniques to monitor user behavior. It helps to understand if the work was done diligently or was done in a rush. The output of the work is indeed a good indicator of the quality of the work; however, an in-depth review of the user behavior is needed to understand the method in which the worker completed the task.

Reflection

To be very honest, I found this paper fascinating and extremely important for research work in this domain. Ensuring the work submitted is of good quality not only helps legitimize the output of the experiment but also increases trust in the platform as a whole. I was astonished to read that about one-third of all submissions are of low quality. The stats suggest that we are wasting a significant amount of resources.

The paper mentions that the tool uses two sources of data, output, and worker behavior. I was intrigued by how they took into account the worker’s behavior, like accounting for the time taken to complete the task, the way the work was completed, including scrolling, key-press, and other activities. I was curious to know if the worker’s consent was explicitly taken. It would also be an interesting study to see if knowing that the behavior is being recorded affects performance. Additionally, dynamic feedback can also be incorporated. By feedback, I mean, if the worker is supposed to take “x” min, alerting them that the time taken on the task is too low. This will prompt them to take the work more seriously and avoid unnecessary rejection of the task.

I have a comment on the collection of YouTube video tutorials. One of the features taken into account was ‘Total Time’, that signified if the worker had seen the video completely first and then summarized the content. However, I would like to point out that sometimes videos can be watched at an increased playback speed. I sometimes end up watching most tutorial related videos at 1.5 times speed. Hence, if the total time taken is lesser than expected, it might simply signify that they might have watched it at a different speed. A simple check could help solve the problem. YouTube generally has a fixed number of playback speeds. Considering that into account, when calculating the total time might be a viable option.

Questions

How are you ensuring the quality of the work completed by crowdworkers for your course project?
Were the workers informed that their behavior was “watched”? Would the behavior and, subsequently, the performance change if they are aware of the situation?
Workers might use different playback speeds to watch videos. How is that situation handled here?

04/08/2020 – Bipasha Banerjee – Agency plus automation: Designing artificial intelligence into interactive systems

April 8, 2020 bipashab 2 Comments

Summary

The paper discusses the fact that computer-aided products should be considered to be an enhancement of human work rather than it being a replacement. The paper emphasizes that technology, on its own, is not always full proof and that humans, at times, tend to rely completely on technology. In fact, AI in itself can yield faulty results due to biases in the training data, lack of enough data, among other factors. The authors point out how the coupling of human and machine efforts can be done successfully through some examples of autocompleting of google search and grammar/spelling correction. The paper aimed to use AI techniques but in a manner that makes sure that humans remain the primary controller. The authors considered 3 case studies, namely data wrangling, data visualization for exploratory analysis, and natural language translation, to demonstrate how shared representations perform. In each case, the models were designed to be human-centric and to have automated reasoning enabled.

Reflection

I agree with the authors’ statement about data wrangling that most of the time is spent in cleaning and preparing the data than actually interpreting or applying the task one specializes in. I was amused by the idea that users’ work of transforming the data is cut short and aided by the system that suggests users the proper action to take. I believe this would indeed help the users of the system if they get the desired options directly recommended to them. If not, it will help improve the machine further. I particularly found it interesting to see that users preferred to maintain control. This makes sense because, as humans, we have an intense desire to control.

The paper never explains clearly who the participants of the system are. This would be essential to know who the users were exactly and how specialized they are in the field they are working on. It would also give an in-depth idea about the experience they had interacting with the system, and thus I feel the evaluation would be complete.

The paper’s overall concept is sound. It is indeed necessary to have a seamless interaction between man and the machine. They have mentioned three case studies. However, all of them are data-oriented. It would be interesting to see how the work can be extended to other forms – videos, images. Facebook picture tagging, for example, does this task to some extent. It suggests users with the “probable” name(s) of the person in the picture. This work can also be used to help detect fake vs. real images or if the video has been tampered.

Questions

How are you incorporating the notion of intelligent augmentation in your class project?
Case studies are varied but mainly data-oriented. How would this work differ if it was to imply images?
The paper mentions “participants” and how they provided feedback etc. However, I am curious to know how they were selected? Particularly, the criteria that were used to select users to test the system.

04/08/2020 – Palakh Mignonne Jude – CrowdScape: Interactively Visualizing User Behavior and Output

April 8, 2020 Palakh Mignonne Jude 1 Comment

SUMMARY

There are multiple challenges that exist while ensuring quality control of crowdworkers that are not always easily resolved by employing simple methods such as the use of gold standards or worker agreement. Thus, the authors of this paper propose a new technique to ensure quality control in crowdsourcing for more complex tasks. By utilizing features from worker behavioral traces as well as worker outputs, they aid researchers to better understand the crowd. As part of this research, the authors propose novel visualizations to illustrate user behavior, new techniques to explore crowdworker products, tools to group as well as classify workers, and mixed initiative machine learning models that build on a user’s intuition about the crowd. They created CrowdScape – built on top of MTurk which captures data from the MTurk API as well as a Task Fingerprinting system in order to obtain worker behavioral traces. The authors discuss various case studies such as translation, picking a favorite color, writing about a favorite place, and tagging a video and describe the benefits of CrowdScape in each case.

REFLECTION

I found that CrowdScape is a very good system especially considering the difficulty in ensuring quality control among crowdworkers in case of more complex tasks. For example, in case of a summarization task, particularly for larger documents, there is no single gold standard that can be used and it would be rare that the answers of multiple workers would match for us to use majority vote as a quality control strategy. Thus, for applications such as this, I think it is very good that the authors proposed a methodology that combines both behavioral traces as well as worker output and I agree that it provides more insight that using either alone. I found that the example of the requester intending to have summaries written for YouTube physics tutorials was an appropriate example.

I also liked the visualization design that the authors proposed. They aimed to combine multiple views and made the interface easy for requesters to use. I especially found the use of 1-D and 2-D matrix scatter plots showing distribution of features over the group of workers that also enabled dynamic exploration to be well thought out.

I found the case study on translation to be especially well thought out – given that the authors structured the study such that they included a sentence that did not parse well in computer generated translations. I feel that such a strategy can be used in multiple translation related activities in order to more easily discard submissions by lazy workers. I also liked the case study on ‘Writing about a Favorite Place’ as it indicated the performance of the CrowdScape system in a situation wherein no two workers would provide the same response and traditional quality control techniques would not be applicable.

QUESTIONS

The CrowdScape system was built on top of Mechanical Turk. How well does it extend to other crowdsourcing platforms? Is there any difference in the performance?
The authors mention that workers who may possibly work on their task in a separate text editor and paste the text in the end would have little trace information. Considering that this is a drawback of the system, what is the best way to overcome this limitation?
The authors the case study on ‘Translation’ to demonstrate the power of CrowdScape to identify outliers. Could an anomaly detection machine learning model be trained to identify such outliers and aid the researchers better?

04/08/2020 – Palakh Mignonne Jude – Agency plus automation: Designing artificial intelligence into interactive systems

April 8, 2020 Palakh Mignonne Jude 1 Comment

SUMMARY

The authors of this paper aim to demonstrate the capabilities of various interactive systems that build on the complementary strengths of humans and AI systems. These systems aim to promote human control and skillful action. The interactive systems that the authors have developed span three areas – data wrangling, exploratory analysis, and natural language translation. In the Data Wrangling project, the authors demonstrate a means that enabled users to create data-transformation scripts within a direct manipulation interface that was augmented by the use of predictive models. While covering the area of exploratory analysis, the authors developed an interactive system ‘Voyager’ that helps analysts engage in open-ended exploration as well as targeted question answering by blending manual and automated chart specification. As part of the predictive translation memory (PTM) project, that aimed to blend the automation capabilities of machines with rote tasks and the nuanced translation guidance that can be provided by humans. Through these projects, the authors found that there exist various trade-offs in the design of such systems.

REFLECTION

The authors mention that users ‘may come to overly rely on computational suggestions’ and this statement reminded me of the paper on ‘Interpreting Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning’ wherein the authors discovered that the data scientists used as part of the study over-trusted the interpretability tools.

I thought that the use of visualizations as part of the Data Wrangling project was a good idea since humans often work well with visualizations and that this can speed up the task at hand. As part of previous coursework, my professor had conducted a small experiment in class wherein he made us identify a red dot among multiple blue dots and then identify a piece of text in a table. As expected, we were able to identify the red dot much quicker – attesting to the fact that visual aids often help humans to work faster. The interface of the ‘Voyager’ system reminded me of the interface of the ‘Tableau’ data visualization software. I found that, in the case of the predictive translation memory (PTM) project, it was interesting that the authors mention the trade-off between customers wanting translators that have more consistent results versus human translators that experienced a ‘short-circuiting’ of thought with the use of the PTM tool.

QUESTIONS

Given that there are multiple trade-offs that need to be considered while formulating the design of such systems, what is the best way to reduce this design space? What simple tests can be performed to evaluate the feasibility of each of the systems designed?
As mentioned in the case of the PTM project, customers hiring a team of translators prefer more consistent results which can be aided by MT-powered systems. However, one worker found that the MT ‘distracts from my own original translation’. Specifically in the case of natural language translation, which of the two do you find to be more important, the creativity/original translation of the worker or consistent outputs?
In each of the three systems discussed, the automated methods suggest actions, while the human user is the ultimate decision maker. Are there any biases that the humans might project while making these decisions? How much would these biases affect the overall performance of the system?