Paper: Data ex Machina: Introduction to Big Data
The main theme of the paper is to critically examine the potential of big data in the field of sociology. The authors start by defining big data and its types (digital life, digital traces, digitalized life). They have reviewed several big data projects and research papers to highlight the opportunities and the challenges of using big data in the field of computational social science. They conclude by discussing six future trends that will affect the use of big data in future.
The paper for this week is a perfect way to conclude the course. It summarizes almost everything that we have learnt so far in the semester. The need for the social computing field is described perfectly in the paper in the line “Big data thus requires a computational social science—a mash-up of computer science for inventing new tools to deal with complex data, and social science, because the substantive substrate of the data is the collective behavior of humans (Lazer et al. 2009).” [Lazer et al]
The authors talk about biases in the self-reported behavior. Qualitative analysis is an important part of this research field that sometimes heavily relies on surveys and interviews. Thus, understanding and reducing biases from surveys and interviews is very important.
Now casting was a new term for me. It was interesting to see the impact of projects like “Billion Prices Project” [how Argentina stopped publishing inflation numbers and used this project to infer inflation]
The authors also reviewed projects where researchers have studied underrepresented population eg. people suffering from depression and having suicidal ideation. But not every population is represented well in all kinds of the online datasets- internet access is still limited in developing countries.
The authors talk about the core issues of big data, prominent issue being that scale of data can lead to the illusion that it contains all the relevant information about all kinds of people. So it’s important to understand what your data is. But how much data is needed to make “general claims” is a question no one has answer to.
A line in the paper “Twitter has become to social media scholars what the fruit fly is to biologists—a model organism.” indicates overuse of Twitter data for research due to its easy availability. The author argues that relying on a single platform can produce issues for generalizability.
The authors in the end discuss future trends, how data is only going to increase in the future coz of several digitization initiatives (almost everything is moving online, paper records are diminishing). Popularity of text based platforms is decreasing and platforms like snapchat and Instagram are rising in popularity. It seems in future, the bulk of data will consist of images and videos. It will be interesting to see different fields (computer vision + data analytics + sociology) coming together to analyse this data.