Summary
Computational social science is still in very nascent stages. Until recently, research in this domain was conducted by computationally minded social scientists or socially inclined computer scientists. The latter has been driven by the emergence of the Internet and social media platforms, like Facebook, Twitter or Reddit. The digitization of our social life presents itself as Big Data. The two papers present a view of Big Data – its collection and analysis, from a social sciences perspective. Social scientists relied on surveys to collect data, but had to continually question the biases in people’s answers. Both papers value Big Data’s contributing in studying the behavior of a populace, without the need for surveys. The challenge in Big Data is oriented more towards organizing and categorizing “data” rather than its size. The main focus of these two papers, however, is less on Big Data, but on the vulnerabilities in such an approach.
Danah Boyd’s paper raises six concerns about the use of Big Data in social sciences. They argue that not all questions in social science can be answered using numbers – the whys need a deeper understanding of social behavior. More importantly, the mere scale of Big Data does not automatically make it more reliable – there still might be inherent biases present. Finally, it raises concerns about data handling and access – who owns digital data and how limited access can create similar divides in society as education.
Lazer’s paper raises almost identical vulnerabilities with Big Data. As he very aptly puts –
The scale of big data sets creates the illusion that they contain all relevant information on all relevant people.
Reflections
As a student and researcher in computer science, it is imporant to remember constantly that computational social science is not an extension of data science or machine learning. We often fall into the trap of thinking about methods before questions. It is best to think of computational social science a derivative of social science rather than computer science.
I enjoyed reading Boyd’s provocations #3 and #4. Big Data is often heralded as a messiah that can answer important behavioral questions about the society. Even if this were true, this will not be because of the scale of the data, but the ability to process and analyse this data. As researchers, it feels increasingly important to consider multiple sources and ask broader questions in social science than ones like – “will a user make a purchase?”. For this, one can’t merely look for patterns in data, but study why the patterns exist. Are these patterns because of the dataset in question, or is it reflective of true societal behavior? While the two papers mention a trend of generalization, especially in the machine learning field, I also see a trend where there is increased specialization. Methods in computer science have decreased applicability to a large enough data sample.
Finally, a major concern about Big Data is privacy and ethics. Unlike with challenges in data analysis, this concern does not have any correct answers. Universities, labs and industries will have to work more closely with IRBs to develop a good framework for the governance of Big Data.
Questions
- To re-iterate, what are the big questions in social science, and where do we draw a balance between quantitative and qualitative analysis?
- While computational social science helps debunking incorrect beliefs about group behavior, can it truly help understand the cause of such behavior?
- Should we change societal behavior, if it were possible?
- While predictive analysis is non-intrusive, what constitutes ethical behavior when social science has more intrusive effects?
- Finally, does research in computational social science encourage large-scale surveillance?