David Lazer and Jason Radford, “Data ex Machina: Introduction to Big Data”
Summary:
The authors in this paper focused on explaining big data for the sociology community and the opportunities they offer to them. They classify big data into three categories based on how social data digitized which are digital life, digital traces and digitalized life. They also explain related problems and issues such as 1) Generalizability, when researchers conclude generalized findings without paying attention to how the data is collected and that each platform has its own census and definitions of social phenomenon such as friendship. 2) Too many Big Data, where the data needed to study a social behavior exist in different platforms that studying a dataset from one of them is not sufficient. 3) Artifacts and Reactivity, where Big data is not to be trusted for the various anomalies and errors due to technical changes. 4) Assuming the human/user being studied true actual and authentic user “they call them ideal users”, while in the digital world this not true due to false users “bots and puppets” and the user data being manipulated.
Reflection:
- I like the suggestion proposed by the authors to solve the generalization issue by combining data from different sources. That makes me think about studying phenomenon on different social platforms and how close and relate they do to each other, for example, studying anti-social behaviors on different social platforms, do users behave similarly? do they use the same language or each platform has its own? if we studied bot/puppets on different systems, do they have similar characteristics so that we can generalize their models to other systems. I think those are interesting studies to conduct.
- The authors assumed that big data gathered from digital social world will represent and facilitate the studies of social phenomenon. This might be true for some phenomenon but untrue for others since people do not behave the same in both worlds. Digital/virtual worlds protect their residents from many consequences that might happen in the real world where people are more conservative, discrete and insecure. Also those virtual worlds promote new behaviors and create new phenomenons that would not exist in the real social life such as anonymity, bots and puppets. That is why big data offer more opportunities and challenges to social scientists than those data gathered from real world using conventional methods such as field studies and surveys.
- Another issue came into my mind when studying big data for social purposes is what is the best data format to represent social data? is it in the traditional tabular format (spreadsheets and relational databases), or as graphs (graph databases and format used by social network analysis tools) or in raw format files (images, text, video)? I think how we represent our data would definitely help and facilitate lots of tasks in our studies.