Reflection #1 – [01/18] – [Vartan Kesiz Abnousi]

Danah Boyd & Kate Crawford: Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society. https://doi.org/10.1080/1369118X.2012.678878

David Lazer and Jason Radford: Data ex Machina: Introduction to Big Data. Annual Review of Sociology.

https://doi.org/10.1146/annurev-soc-060116-053457

 

Summary

The articles aim to provide a critical analysis of  “Big Data”. Moreover, there is a brief historical account on how the term “Big Data” was born and its definition. They stress that Big Data is more than large datasets, it is about a capacity to search, aggregate, and cross-reference large data sets. They underlie the great potential of Big Data in solving problems. However, they also warn about the dangers that it brings. One danger is that it could be perceived as a panacea for all research problems and reduce the significance of other fields, like the humanities. It is being argued that the tools of Big Data do necessarily offer an “objective” narration of the reality, as some of its champions boast. For instance, the samples of Big Data are often not representative to the entire population. Therefore, making generalization for all groups of people solely based on Big Data is erroneous. As the authors argue “Big Data and whole data are also not the same “.  It is not necessary to have more data to get better insight regarding a particular issue. In addition, there are many steps in the analysis of the raw Big Data, such as “data cleaning”, where human judgement is required. As a result, the research outcome could be dramatically different due to the subjective choices that were made in the process of data analysis. In addition, concerns regarding human privacy are raised. The human subjects may either not be aware or give their consent to have their data collected. What is private and public information has become more obscure. State institutions might use the data in order to curtail individual civil liberties, a phenomenon known as Big Brother. A particularly important problem is that of research inequality, which takes numerous forms. For example, companies do not provide full access of the collected data to public research institutions. As a result, the few privileged who are within the companies and have complete access can find different, more accurate, results. In addition, those companies usually partner or provide access of their data to specific elite universities. As a result, the students of these schools will have a comparative advantage in their skills compared to rest. This sharpens both the social and research inequalities. The very definition of knowledge is changed. People now get large volumes of epidemiological data without even designing an experiment.  As the authors argue, “it is changing the objects of knowledge”. The authors also argue that Big data is vast and heterogeneous. They classify the data into three sources, digitalized life, digital traces and digital life. As digital life they refer to Twitter, Facebook, and Wikipedia which are all platforms where behaviors are all online. The authors argue that these platforms can either be viewed as generalizable microcosms of society or as distinctive realms in which much of the human experience now resides. Digital traces include information collected from sources such as phone calls, while an example of digitalized life are the video recordings of the individuals.

Reflections

Both articles are very well written. I agree with the points that the articles raise. However, I am particularly cautious about the notion of viewing digital life as a microcosm of our society. Moreover, such a generalization is more than just an abstract, subjective, idea. It is rigorously defined in probability theory. There are mathematical rules on whether a sample is representative or not. A famous example are the 1948 US presidential elections when Truman won, at the time all the elections polls were wrong because of sampling errors. I am also worried that some of these digital platforms bolster a form of herd behavior that renders individuals less rational. This herd behavior that has been studied by social scientists such as Freud and Jung, among many, has been argued that was one of the causes for the rise of Fascism.

Finally, I have some questions that could develop into research ideas such as:

  1. Does not the nature of the digital platform i.e. Twitter change an individual’s behavior? If yes, then how?
  2. Is the increasing polarization in the United States related to these digital platforms?
  3. Does digital anonymity alter someone’s behavior?
  4. Do people behave the same way across different digital platforms?
  5. Can we, as researchers, develop a methodology to render digital platforms, traces, representative to the population?

Leave a Reply

Your email address will not be published. Required fields are marked *