Reflection #1 – [1/18] – MD MOMEN BHUIYAN

Paper #1: Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon
Paper #2: Data ex Machina: Introduction to Big Data

Summary #1:
This paper discusses what is entailed by the phenomenon of big data from the perspective of socio-technical research. The authors describe Big Data phenomenon as an interplay between technology, analysis, and mythology. Here mythology represents the uncertainty of inferring knowledge from big data. The theme of the paper is six provocations about the issues in big data: knowledge definition, objectivity claim, methodological issue, importance of context, ethics, and digital divide due to access restriction. Big data introduces new tools and research methods for information transformation which in turn changes the definition of knowledge. While quantifying information in big data might seem to provide objective claim, it is still the subjective observation that initiates it. It’s the design decisions in the interpretation of data that incorporate subjectivity and its biases. One caveat of interpreting big data is that seeing correlation where none exists (p.11). The methodological limitation of big data lies in the source and collection procedure. It is necessary to understand how much the data was filtered and if it is generalizable to the research domain. One solution to this problem is using heterogeneous sources to combine information which also amplifies noise. This heterogeneous combination is also helpful in modeling the big picture. For example, Facebook can alone provide a relationship graph of a user. But it is not necessarily the overall representation. Because multi-dimensional communication methods like social network, email, phone etc. each provide a new representation. So context is very important in describing such information. Big data also raises the question of ethics in research. The sheer volume of big data could provide enough information to de-anonymize. Information should be carefully published to protect the privacy of the entities involved. Finally, accessibility of big data divides the research community into two groups where one group is more privileged. Computational background also creates similar division among big data researchers.

Summary #2:
This paper is similar to the previous one in the sense it discusses similar issues from the previous paper. The authors here first discusses the data sources for big data by dividing them into three topological categories: digital life, digital trace, digitalized life. Digital life refers to the digitally mediated interactions like tweeting, searching etc. Digital traces are the records that indirectly provides information about the digital life like Call detail records. Finally digitalized life represent the capture of a nondigital portion of life into a digital form like constant video recording in an ATM. There is also possibility of collecting specific behavior like certain types of tweets or visiting certain webpage. These data provide several opportunities for behavioral research. Big data provides large data set from different sources and combination of these sources provides important incites like the Copenhagen Network Study. Big data also provides cost-effective solutions to some studies like unemployment detection, disease propagation study etc. Effect of external changes can be captured by big data like Hurricane, price hike etc. By focusing on underrepresented population, big data is used to study certain problems like PTSD, suicidal ideation etc. The vulnerabilities of big data include the problem of generalizability of hypothesis, heterogeneous sources, errors in the source systems, and ideal user assumption. Research on big data includes ethical issues in both acquisition and publishing of data. Finally, recent big data trends include: new sources of data, generic model for research, qualitative approach in big data research etc.

Reflection:
Both of the paper discusses issues and application of big data in identifying social phenomena. This reflection focuses on the generalizability issue in big data. The authors suggest combining multiple sources to validate can solve generalizability issue. This seems interesting given recent deep learning community has found that generalizing a model can be achieved using more data as well as using transfer learning. Similar approach can be used in finding social phenomenon in big data. For example, data from Twitter can provide with information about the spreading of rumors by people with certain attributes. Although Facebook is too different from Twitter, it is possible to use the hypothesis and the result from Twitter to initiate a learning model to apply in facebook. What do you think?

Leave a Reply

Your email address will not be published. Required fields are marked *