- D Boyd et al. Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon.
- D Lazer et al. Data ex Machina: Introduction to Big Data.
Summary:
These two papers both summarize and discuss some critical questions of big data ranging from its definition to society controversy, which demonstrates that it’s necessary for researchers in social science to consider a lot of questions prone to be ignored before conducting their research. In critical questions for big data, the author summarizes six crucial points regarding big data. 1)Big data today will produce a new understanding of knowledge. 2) All the researchers are interpreters of data, what’s the line between being subjective and being objective? 3) Big quantities of data do not necessarily mean better data because data sources and data size tend to offer researchers a skewed view. 4) The context of big data cannot be neglected. For example, today’s social network does not necessarily reflect sociograms and kinship network among people. 5) Research ethics is always supposed to be considered while using data. 6) The resource of data is not equal to different research groups, which entails a digital division in the research community. In Data ex Machina, the author explicitly illustrates definition, sources, opportunities, vulnerabilities of big data. By reviewing some of the particular projects in the literature, e.g., The Billion Prices Project, EventRegistry, GDELT, it offers us a convincing sight of the existing problem in big data. For examples, it expounds from three aspects, i.e., data generation, data sources, data validation, to illustrate vulnerabilities in big data research. The author concludes by discussing the future trends of big data — more quantities of data with expected standard forms and enabled research approaches will come.
Reflection:
In social science, researchers utilize a large amount of data from different platforms and analyze it to prove their assumptions or explore potential laws. Just like how Fordism produced a new understanding of labor, the human relationship at work in the twentieth century, big data today also changes the way of people’s understanding of knowledge and human network/communities. These two papers cover a lot of viewpoints I have never thought about even if I already knew big data and did some related simple tasks. Instances like “firehose” and “bots on the social media” trigger my interest in how to improve the scientific environment of big data. Besides, they prompt readers to think in depth about research data they are using with a dialectical perspective. Data collecting and preprocessing are more basic and critical than I’ve ever thought. Is quantity bound to represent objectivity? Can data in large number give us the whole data we need to analyze in our specific context? Are data platforms themselves unbiased? The truth is — there are data controllers in the world, i.e., some authorities/organizations/companies have power in controlling data subjectivity and accessibility; we’ve got data interpreters, all the researchers can be considered as interpreters in some ways; we’ve got booming data platforms/sources for researchers to make choices.
In general, the papers enlighten me on the big data with a context of social science in two ways: 1) researchers should always avoid using data in a way that would obviously affect the rigor of research, e.g., use one specific platform like Twitter to analyze the kinship network. For researchers, it’s necessary to jump out of individual subjectivity to interpret data. 2) Both organizations and researchers should put effort to construct a healthy and harmonious big-data community to improve the accessibility and validation of data, to formulate scientific usage standards and sampling frames for big data. For any authorities, networks or individuals, we are supposed to dedicate ourselves to work that can potentially benefit the whole big data community. In this way, all the scientific researchers will have more faith and courage to face the coming era of big data with more challenges but also more valuable knowledge.
Questions:
- What’s the definition of knowledge in the twentieth century? How about now?
- How to analyze people’s relationship network without distortion? How many data platforms do we have to use? e.g., email, twitter, facebook, Instagram… what kind of combination is the best choice?
- To what extent do we have to consider the vulnerabilities of accessible data? For example, if we can use currently available datasets to solve a practical problem, we may ignore some of vulnerabilities and limitations a little bit.
- How much can systematic sampling frames help us in analyzing a specific assumption?
- What are the uppermost questions for researchers to think when collecting/processing data?
- What are the situations that would be best to avoid for researchers when collecting/preprocessing data?