[1]. Danah Boyd & Kate Crawford (2012) CRITICAL QUESTIONS FOR BIG DATA, Information, Communication & Society, 15:5, 662-679, DOI: 10.1080/1369118X.2012.678878
Summary :
In this paper, the authors describe big data as a ‘capacity to search, aggregate, and cross-reference large datasets’. They then go on to describe the importance of handling the emergence of the era of big data critically as this would influence the future. They also discuss the dangers to privacy due to this big data phase and other concerning factors that exist. They then discuss in detail the assumptions and biases of this phenomenon using six points. The first point is how big data has changed the definition of knowledge. Another being that having a large amount of data does not necessarily mean that the data is good. The discussion on the ethics of the research being done and the lack of regulating techniques and policies are explained with examples by the authors to emphasize the importance. They also discuss the access of this data by limited organizations and the divide it creates.
[2]. D. Lazer and J. Radford, “Data ex Machina: Introduction to Big Data”, Annual Review of Sociology, vol. 43, no. 1, pp. 19-39, 2017.
Summary :
In this paper, the authors review the value of big data research and the work that needs to be done for big data in sociology. They first define big data and then discuss the following three big data sources:
- Digital Life: Online behavior from platforms like Facebook, Twitter, Google, etc.
- Digital Traces: Call Detail Records which are only records of the action and not the action itself.
- Digitalized Life: Google books, phones that identify proximity using Bluetooth.
The authors believe that the availability of these forms of data along with the tools and techniques required to access such data provides sociologists with the opportunity to take advantage and answer the various age-old or new questions. To this end, the authors mention the opportunities available to sociologists in the form of massive behavioral data, data obtained through nowcasting, data obtained through natural and field experiments, and, data available on social systems. The authors then proceed to discuss the pros and cons of sampling the available big data. They also mention the vulnerabilities that exist such as too much volume of data, the generalizability of data, platform dependence of data, failing ideal user assumption, and, ethical issues in big data research. In conclusion, the authors mention few of the future trends the knowledge of which will help sociologists succeed in big data research.
Reflections :
In [1], the authors ask various questions with the theme ‘Will Big Data and the research that surrounds it help the society?’ I like the definition of big data as a ‘Socio-technical’ phenomenon. I also like the thought that is provoked by the usage of the term ‘Mythology’ in the formal big data definition. The big data paradigm and its rise to fame do somewhat revolve around the belief that the volume of the data provides new true and accurate insights. This gives rise to the question ‘Do we sometimes try to find or justify false trends just because we have big data?’ I like the example using which they represent the platform dependence of social data. The social network of a person on Facebook may not be same as on Twitter by virtue of the fact that the data is different. This could be for a lot of reasons, with the basic one being that some user may not be present on both those social sites. This gives rise to another question ‘What about the population who is not on any social site?’. That chunk of the population is not being considered in any of the studies. Also, the very fact that sometimes ease of accessibility of the data is considered over the quality of data raises concerns. I also like that the authors address the quantitative nature of big data research and the importance of context. I appreciate the section in which they discuss the availability of this big data by few organizations and the ‘Big Data Rich’ and ‘Big Data Poor’ divide that it creates. This is something which has to be considered to facilitate successful big data research. In [2], I appreciate the definition of big data that has been provided by the authors. Big data is indeed a mix of computer science tools and social science questions. The authors mention that sociologists need to learn how to leverage the tools and techniques provided by computer scientists to make break-through in their research. This makes an excellent collaboration where computer scientists leverage the questions and research expertise of social scientists and social scientists leverage the tools and techniques developed for providing insights into the big data. I like the way the authors mention big data archives as depicting actual behavior “in principle”. Although there are instances which show positive results when studying behaviors using such big data, the question that arises is ‘How genuine is this online behavior?’. Many factors play a role in these studies. The biases present in the data have to be considered. If data from social networks is being considered, one of the most basic examples of bias is the ideal user assumption as highlighted in the paper. Moreover, the veracity of the data has to be considered as well. Another important bias mentioned in the paper arises due to the incorrect sampling of data. I realize that sample data from the big data can provide valuable insights. However, this raises the questions ‘What methods can be applied to sample data without bias?’ I appreciate the effort that the authors have invested by providing many case study examples to emphasize the points that they mention in the review. This provokes thoughts about the vulnerabilities and the work that has to be done to make big data research as ethical and methodical as possible.