Reflection #13 – [11/29] – [Dhruva Sahasrabudhe]

Paper-

Data ex machina: Introduction to big data – Lazer et. al.

Summary-

This article was a survey article, talking about the potential of big data in sociology, and computational social science. It served as a very good survey article, providing many examples and references of interesting research being done, which leverages data to glean insight in social science, and talks about the different interpretations of data, depending on the knowledge we wish to gain. The article began with a statement saying that what is captured from data isn’t what the social scientist wants, which makes how to mine what we want from the dump of interactions a difficult task in itself, which is an important theme in the article.

Reflection-

I liked the use of the phrase “substantive substrate of the data is the collective behavior of humans” used to describe the applicability of data to the social sciences, as it paints a good picture of the task of distilling understanding of human behavior from a lot of interactions.

I also found the two interpretations of social media platforms as either microcosms of all of society, versus a realm in itself, interesting. The second interpretation holds that not only are these platforms incomplete at capturing all human experience, but they also modify human behavior in their own right.   

The tools and results obtained by The Copenhagen Network Study were interesting because they tried to obtain meaningful data about interaction from diverse sources, i.e. mobile phone exchanges and facebook, and found that participants were actually using those two communication media for different tasks, to interact with largely distinct groups of friends, reinforcing the second interpretation of social/communications platforms (from the former paragraph).

Another fascinating insight I got from this article was on how big data can be used to cheaply analyze and interpret politically relevant information on a national level, e.g. the research on predicting inflation rates using goods prices, or the research on estimating the impact of a hurricane.

Making big data small, i.e. identifying subgroups of interest within the dataset, like the leaders of a revolution, people with PTSD/other psychological disorders, etc. can be used to study these phenomena retrospectively in an unobtrusive manner.

I also found the term “big data hubris” used in the article interesting, since it helped me understand that volume of data can be misleading if sampling is not done properly, or if you do not understand the data you have. For example, the spiked trends in usage of the word “fuck” in books in the 1800s, in Google Ngrams was found to be due to a failure of OCR systems to read archaic spellings of the letter “s”. The presence of a large number of fake accounts and bots on certain platforms also makes it important to ensure that the data obtained is genuine.

This article was a thought-provoking and fascinating read. It was a wonderful way to conclude the reflections we did in this course, as it gave a high level, but broad insight into research areas in the field of social computing.

Leave a Reply

Your email address will not be published. Required fields are marked *