Reflection #1 – [1/24] – Hamza Manzoor

[1]. Danah Boyd & Kate Crawford (2012) CRITICAL QUESTIONS FOR BIG DATA

Summary:

In this paper, the authors describe big data as a cultural, technological, and scholarly phenomenon. They explain that the way we handle the emergence of an era of Big Data is critical because current decisions of how we define the use of big data will shape the future. They also describe different pitfalls and discuss six provocations about the issues of Big Data. In these six points they discuss that big data has created a radical shift in how we think about research and has changed the definition of knowledge. They also break the common myth most researchers have that data solves all problems and also point out that the access of data to privileged few is creating a new divide. Furthermore, they go on to explain that big data especially social media data can be sometimes misleading because it not necessarily represent the entire population. They further discuss the ethics of using big data in research and the lack of regulations on ethical practices of research.

[2]. D. Lazer and J. Radford, Data ex Machina: Introduction to Big Data

Summary:

In this paper, the authors define big data and institutional challenges it presents to sociology. They touch base on three types of big data sources and enumerate the promises and pitfalls common across them. The authors are of the opinion that crosscutting these three types of big data is the possibility for sociologists to study human behavior. The authors also discuss the opportunities available to sociologists with the huge amount of data available through various social systems, natural and field experiments and other digital traces. They also explain how targeted sample from a huge chunk of data can be used to study behavior of minorities. They further discuss the vulnerabilities in big data including generalization that data represents entire population, fake data generated through bots and different sources of data with different accessibility and issues that these vulnerabilities presents.

Reflections:

From both Boyd & Crawford’s and Lazer & Radford’s descriptions, I took away that big data should be carefully used keeping in mind ethical issues. Furthermore, the key take away from these papers for me is that big data is not just about size but also how we manipulate the data to generate insights about human behaviors.

I particularly liked Boyd & Crawford’s provocation #3 that bigger data is not necessarily a better data. We computer scientists have common belief that more data can solve all the problems but in actuality this is not essentially true because the data at hand no matter how big is it might not be representative at all for example: trillion rows of Twitter data will still only represent small portion of Twitter users and therefore, generalizing and making claims about behaviors and trends can be misleading. The predictions made using this data will therefore have inherent biases. Since social media data is the biggest source of big data so now the question that comes to mind after this is how do we know if data is true representative or not? If not, then from where do we get the data that is true representation of entire population?

I have concerns about Lazer & Radford’s solution to generalizability that data from different systems should be merged. Is it even possible for a normal sociologist researcher? Will companies provide access to their entire dataset? Boyd & Crawford’s paper explains that people with different privileges have different level of access to the data. Even if we consider an ideal world where we have access to data from all the sources, how will we link data from different sources? For example: A Twitter user handle to Facebook profile and Snapchat username because currently the chunk of data available of Facebook users might not have same users available in twitter data. Will Facebook provide access to their entire dataset?

Nonetheless, the papers enlightened me to think how big data can be used in context of social science and what are the ethical vulnerabilities associated with it.

 

Questions:

 

How do we know if data is true representative or not? Where do we get the data that is true representation of entire population?

Is it possible to link data from different sources?

How do we know what companies are doing at the backend is ethical or not?

Do people behave in same way on different digital platforms?

Can computational social science correctly explain human behavior with current data we have? Because papers suggested that data we have is not true representation until merged.

Leave a Reply

Your email address will not be published. Required fields are marked *