Reflection #13 – [11/29] – [Karim Youssef]

In the last decade, the evolution of digital systems and connectivity has created a gushing stream of data that contains latent treasures. Multiple domains such as biology, astronomy, and social science are facing an unprecedented challenge; how to deal with Big Data? Big data is perhaps one of the most resonant scientific terms in the last decade. A specific field of study in computer science has been established to develop hardware and software systems that scale with big data.

In social science, big data gushing out of online social platforms represents an invaluable gold mine. In their work titled Data ex Machina: Introduction to Big Data, David Lazer et al. explore the opportunities and challenges of using big data to study and analyize different social phenomena. From their perspective, big social data comes from three sources; digital life ( e.g. online social platforms ), digital traces ( e.g. call records ), and digitized life ( e.g. digitizing old books and newspapers ).

Many opportunities exist in analyzing data that are generated from the aformentioned sources. These data could reflect actual patterns of social activities that are hard to extract from research surveys and questionnaires. It also creates the opportunity to analyze social interactions with breaking events as they are happening, rather than performing a retrospective analysis on past events.

From my perspective, archived data from social platforms could also serve as a treasure to reptrospectively analyze some special events. An example that always influences my ideas is the Arab Spring uprisings. For example, the Egyptian people have lived a very precious experience between 2011 and 2013. In some places in the world, online social platforms could be the best places to record the traces of some events.

Online social platforms could also serve as a natrual experiment field, a great opportunity but with great ethical concerns. An example could be Facebook’s experiment on social influence and emotion contagion, a very promising experiment that raised a huge ethical debate.

David Lazer et al. also explore a set of challenges and vulnerabilities. Those challenges include the generalizability of analysis performed on an individual data source ( e.g. Twitter ). From this point, they shed the light on the problem of social activities of individuals are spread across multiple social platforms. They also talk about the the credibility and legitimacy of data given the widespread of bots and fake indentities.

From my view, I believe that no matter how hard are the aformentioned challenges, the continuous research efr fort to understand and solve these problems will likely converge at some point. The harder challenge that is likely to linger is the ethical challenge. David Lazer et al. shed the light on the research ethics. How to guarantee the rights of a human subject research, have an informed concent in place, and still guarantee the quality and benefits of collecting and analyzing his online social data. Some people commented on social platforms being free by saying “if it is offered to you for free, then probably you are the goods being sold“. From my perspective, establishing and applying rules that preserve the rights of every user, and being trasnparent about everything, is a lingering challenge.

Read More

Reflection #13 – [11/29] – [Eslam Hussein]

David Lazer and Jason Radford, “Data ex Machina: Introduction to Big Data”

Summary:

The  authors in this paper focused on explaining big data for the sociology community and the opportunities they offer to them. They classify big data into three categories based on how social data digitized which are digital life, digital traces and digitalized life. They also explain related problems and issues such as 1) Generalizability, when researchers conclude generalized findings without paying attention to how the data is collected and that each platform has its own census and definitions of social phenomenon such as friendship. 2) Too many Big Data, where the data needed to study a social behavior exist in different platforms that studying a dataset from one of them is not sufficient. 3) Artifacts and Reactivity, where Big data is not to be trusted for the various anomalies and errors due to technical changes. 4) Assuming the human/user being studied true actual and authentic user “they call them ideal users”, while in the digital world this not true due to false users “bots and puppets” and the user data being manipulated.

Reflection:

  • I like the suggestion proposed by the authors to solve the generalization issue by combining data from different sources. That makes me think about studying phenomenon on different social platforms and how close and relate they do to each other, for example, studying anti-social behaviors on different social platforms, do users behave similarly? do they use the same language or each platform has its own? if we studied bot/puppets on different systems, do they have similar characteristics so that we can generalize their models to other systems. I think those are interesting studies to conduct.
  • The authors assumed that big data gathered from digital social world will represent and facilitate the studies of social phenomenon. This might be true for some phenomenon but untrue for others since people do not behave the same in both worlds. Digital/virtual worlds protect their residents from many consequences that might happen in the real world where people are more conservative, discrete and insecure. Also those virtual worlds promote new behaviors and create new phenomenons that would not exist in the real social life such as anonymity, bots and puppetsThat is why big data offer more opportunities and challenges to social scientists than those data gathered from real world using conventional methods such as field studies and surveys.
  • Another issue came into my mind when studying big data for social purposes is what is the best data format to represent social data? is it in the traditional tabular format (spreadsheets and relational databases), or as graphs (graph databases and format used by social network analysis tools) or in raw format files (images, text, video)? I think how we represent our data would definitely help and facilitate lots of tasks in our studies.

 

 

Read More

Reflection 13 [11/29] Neelma Bhatti

Reference:
Lazer, D., & Radford, J. (2017). Data ex machina: Introduction to big data. Annual Review of Sociology, 43, 19-39.
Reflection:
This article reminded me of two things, why I love sociology/psychology and why I like literature surveys. Literature surveys, Sociology and Big data seemed like a good amalgamation of things which reveal interesting findings about literature,  groups of people and  patterns respectively.
The article kind of puts the semester long class of social computing in one place by discussing Big data, social systems, bots and sock puppets creating the illusion of perfect user and social contagion, to name a few. The author emphasizes the strength of incorporating sociological studies and computer science to make the best use of Big Data. He explores the opportunities and threats and also provides suggestions for addressing future challenges with big data research.
The author did a great job of summarizing almost every aspect of the opportunities and vulnerabilities associated with Big Data . The role of big data in transforming learning and higher education seemed to be missing though. Big data its still a niche topic in the field of education, but governments have started to produce reports about the potential of big data in education [1]
Although at a glance, most of the studies designed to gather or examine big data to extract useful pattern look intrusive (like the behavioral involving college students who were provided with cell phones for the investigation of user data), I believe their is a need to educate people about the greater good of collaborating by providing passive, non-sensitive data for scientific and behavioral studies. Because we are generally skeptical about possible privacy invasions through our phones and social media accounts, but also want to remain foremost at the receiving end of scientific advancements.
 Just as in an emergency people are willing to do monetary help, they should also be educated to contribute by providing data to investigate the situation. This will greatly reduce the vulnerabilities associated with big data such as errors and misinterpretations of data due to self-reporting.
[1]Eynon, R. (2013). The rise of Big Data: what does it mean for education, technology, and media research?.

Read More

Reflection #13 – [11/29] – [Prerna Juneja]

Paper: Data ex Machina: Introduction to Big Data

The main theme of the paper is to critically examine the potential of big data in the field of sociology. The authors start by defining big data and its types (digital life, digital traces, digitalized life). They have reviewed several big data projects and research papers to highlight the opportunities and the challenges of using big data in the field of computational social science. They conclude by discussing six future trends that will affect the use of big data in future.

The paper for this week is a perfect way to conclude the course. It summarizes almost everything that we have learnt so far in the semester. The need for the social computing field is described perfectly in the paper in the line “Big data thus requires a computational social science—a mash-up of computer science for inventing new tools to deal with complex data, and social science, because the substantive substrate of the data is the collective behavior of humans (Lazer et al. 2009).” [Lazer et al]

The authors talk about biases in the self-reported behavior. Qualitative analysis is an important part of this research field that sometimes heavily relies on surveys and interviews. Thus, understanding and reducing biases from surveys and interviews is very important.

Now casting was a new term for me. It was interesting to see the impact of projects like “Billion Prices Project” [how Argentina stopped publishing inflation numbers and used this project to infer inflation]

The authors also reviewed projects where researchers have studied underrepresented population eg. people suffering from depression and having suicidal ideation. But not every population is represented well in all kinds of the online datasets- internet access is still limited in developing countries.

The authors talk about the core issues of big data, prominent issue being that scale of data can lead to the illusion that it contains all the relevant information about all kinds of people. So it’s important to understand what your data is. But how much data is needed to make “general claims” is a question no one has answer to.

A line in the paper “Twitter has become to social media scholars what the fruit fly is to biologists—a model organism.” indicates overuse of Twitter data for research due to its easy availability. The author argues that relying on a single platform can produce issues for generalizability.

The authors in the end discuss future trends, how data is only going to increase in the future coz of several digitization initiatives (almost everything is moving online, paper records are diminishing). Popularity of text based platforms is decreasing and platforms like snapchat and Instagram are rising in popularity. It seems in future, the bulk of data will consist of images and videos. It will be interesting to see different fields (computer vision + data analytics + sociology) coming together to analyse this data.

 

 

Read More

Reflection #13 – [11/29] – [Deepika Rama Subramanian]

This article, surprisingly originating from a sociology journal, seems to succinctly talk about everything we spoke about this semester in our class. They speak about big data’s enormous use in understanding human behaviour. Especially behaviours/behaviour patterns that are difficult to record or are often misreported in self-reports.

As is fit by a sociology journal, they cast the world of big data into various manifestations – digital life, digital traces, digitalized life and instrumentation of human behaviour. When the authors say that digital life can be viewed as generalizable microcosms of society, I wonder if this is appropriate. Given the fluid and incremental way in which platforms grow, does it seem appropriate to give and keep a label?

As we’ve seen several times before in this class, one worries about the ethical conundrum in mining big data to gauge insights in sociology. While the tracking of phones of several students to understand the ties between friends (from Facebook), the students were aware of this. We previously read a paper about Facebook (without the knowledge of its users) tweaking its feed to gain knowledge about emotional reactions to the posts on their feed.

We are also surrounding ourselves with gadgets and objects that are constantly providing information that can be exploited possibly for things we would not consent to. In a sense, we are creating our own surveillance state. Dr. Michael Nelson, during a recent seminar here, spoke about this very problem. By using fitness tracking devices, virtual assistants, we are unknowingly and unwittingly providing valuable data for analysis. The fitness app Strava faced some flak for this when US soldiers in Afghanistan used the app for tracking while running around army bases there. The soldiers unbeknownst to themselves were clearly giving away locations of secret army bases. Obviously, we are still having some trouble keeping all the data being misused.

Another issue with big data that was mentioned by the authors that seems prevalent is the issue of inclusivity. During the talk given by Dr. Rajan Vaish, I remember his mentioning that even during their study (Whodunit?), they found it difficult to involve the rural population in India. Ofcourse this is in part due to the expensive, metered internet connections but this is also a question of interest (in case of other major online platforms) for the community.

The authors themselves have outlined several other issues that come with big data analytics in sociology. One of the more familiar ones is the issue of bots, puppets and manipulation. We have, over the course of this course, seen several ways we can curb this behaviour making the data that is available to us more meaningful. But this problem is present for now skewing a lot of the ongoing analyses.

Finally the authors talk about qualitative approaches to big data and I almost laughed out loud! We were battling this issue with a much much much smaller data set until even a few days ago. The authors promise computationally enabled qualitative analyses that will help us analyse data that is beyond the capacity of armies of grad students to read. To this we say thank you!

This article was a fitting way to end the semester and the course. It was in itself a summary of sorts making it slightly difficult to summarise and reflect on.

Read More

Reflection #13 – [11/29] – [Mohammad Hashemian]

Data ex machina: Introduction to big data – David Lazer and Jason Radford

 

Although the dynamic features of social media and Big Data provide a great research opportunity for social network researchers to analyze human behaviors, there are many challenges the world envisaged associated with analysis of Big Data. In this study, some Big Data projects considering their Big Data challenges are reviewed.

Big Data in this review paper is broken down into three domains of digital life, digital traces, and digitalized life. What I found out from this categorization is that digitization of human life has made it much easier to analyze her social behavior than before. As it is mentioned in this paper, it is possible for not only social network platforms owners but third parties to harvest data from these platforms. It may be thought that this is always in the interest of users, because researchers can use these valuable information to study human behavior and the result of this research are ultimately useful for the users. But unfortunately, it is not always the case.

Nowadays, one of the most common ways of finding people is through social media sites (digital life), which is a valuable feature for debt collectors because they can use the social networks to find people who owe money. They call this action skip tracing which means tracking down a debtor when there is no information about their current address, phone number, or place of employment. Researchers in these kinds of companies use many state-of-the-art data analytics techniques to obtain valuable information, including freely available information on social media sites, when they want to find someone in order to collect a debt. What was mentioned, was one of the applications of Big Data which is one of the results of digital life and is not accepted by the public. But if I just want to focus on Big Data and its challenges researchers are dealing with, I think some other challenges can be pointed out which are not mentioned in this research.

In the part “The Ideal User Assumption: Bots, Puppets, and Manipulation” authors demonstrate the importance of user’s true identity by focusing only on vulnerability of data because of existence of Bots, Puppets, and Manipulation. However, there is another research challenge in mapping Big Data related to identity of the users.

Many Big Data sources do not contain detailed demographic information unlike traditional data collection methods that result in comprehensive user profiles. Without knowing users’ information, Big Data research may be biased. For instance, the actual users of social media services are generally from younger generation[1]. Thus, data collected from social media represent a small sample of the whole population. It seems more research to understand the user profiles in Big Data sources are needed.[2]

Another Big Data problem is that researchers cannot re-run the published Big Data research. Generally, published scientific research from previous publications can be verified by other researchers by different/same methods with the same data (which is one of the significant features of scientific research). But, many Big Data like social media messages and mobile phone data sets are proprietary. Therefore, researchers unable to re-test the most of the recent published social media research. As an example, consider the Twitter API. Based on the Twitter API agreement, the original raw tweets cannot be distributed by researchers to anyone except for their research groups. Many researchers can only retrieve 1% of randomly sampled data via public APIs for their research. This Big Data problem can be an important obstacle to the advancement of social media research in the future.

 

 

[1] 40% of Twitter users are between the ages of 18 and 29, 25% users are 30-49 years old (https://www.omnicoreagency.com/twitter-statistics/)

[2] https://www.tandfonline.com/doi/full/10.1080/15230406.2015.1059251

Read More

Reflection #13 – [11/29] – [Dhruva Sahasrabudhe]

Paper-

Data ex machina: Introduction to big data – Lazer et. al.

Summary-

This article was a survey article, talking about the potential of big data in sociology, and computational social science. It served as a very good survey article, providing many examples and references of interesting research being done, which leverages data to glean insight in social science, and talks about the different interpretations of data, depending on the knowledge we wish to gain. The article began with a statement saying that what is captured from data isn’t what the social scientist wants, which makes how to mine what we want from the dump of interactions a difficult task in itself, which is an important theme in the article.

Reflection-

I liked the use of the phrase “substantive substrate of the data is the collective behavior of humans” used to describe the applicability of data to the social sciences, as it paints a good picture of the task of distilling understanding of human behavior from a lot of interactions.

I also found the two interpretations of social media platforms as either microcosms of all of society, versus a realm in itself, interesting. The second interpretation holds that not only are these platforms incomplete at capturing all human experience, but they also modify human behavior in their own right.   

The tools and results obtained by The Copenhagen Network Study were interesting because they tried to obtain meaningful data about interaction from diverse sources, i.e. mobile phone exchanges and facebook, and found that participants were actually using those two communication media for different tasks, to interact with largely distinct groups of friends, reinforcing the second interpretation of social/communications platforms (from the former paragraph).

Another fascinating insight I got from this article was on how big data can be used to cheaply analyze and interpret politically relevant information on a national level, e.g. the research on predicting inflation rates using goods prices, or the research on estimating the impact of a hurricane.

Making big data small, i.e. identifying subgroups of interest within the dataset, like the leaders of a revolution, people with PTSD/other psychological disorders, etc. can be used to study these phenomena retrospectively in an unobtrusive manner.

I also found the term “big data hubris” used in the article interesting, since it helped me understand that volume of data can be misleading if sampling is not done properly, or if you do not understand the data you have. For example, the spiked trends in usage of the word “fuck” in books in the 1800s, in Google Ngrams was found to be due to a failure of OCR systems to read archaic spellings of the letter “s”. The presence of a large number of fake accounts and bots on certain platforms also makes it important to ensure that the data obtained is genuine.

This article was a thought-provoking and fascinating read. It was a wonderful way to conclude the reflections we did in this course, as it gave a high level, but broad insight into research areas in the field of social computing.

Read More

Reflection #13 – [11/29] – [Subil Abraham]

Lazer, David, and Jason Radford. “Data ex machina: Introduction to big data.” Annual Review of Sociology 43 (2017): 19-39.

This article provides an introduction to the world of big data to sociologists. It talks about the possibilities of using big data to identify new phenomena in human behavior. It also goes over the potential pitfalls of relying on big data and makes the case that big data works best when used in combination with other methods rather than in isolation. It concludes with what the future could hold for using big data for Sociology.

This article is an appropriate bookend for this semester. It goes over the major themes that we covered in detail in our classes and provides a good summary of the things we’ve learned. On noticing that this article was published in a Sociology journal, I was reminded that despite all the computing related things we’re doing, the ultimate goal is to further the study of humans in this connected world. All the machine learning and data analysis was a means to an end, which is obvious in retrospect but is not really in the forefront of your mind when you are deep in the throes of writing code.

The authors made an interesting point about scientists fixating on single platforms like Twitter, making the comparison to ‘model organisms’ in Biology. Behavior on Twitter is exclusive to Twitter and doesn’t necessarily reflect the wide range of human behavior. But even within so-called model organisms, there is such a rich research potential for studying human behavior. I don’t believe that findings need to be generalizable to be useful. Interesting observations can be made on single platforms and there is no need to constantly consider how generalizable the information is. Consider Finstagrams [1], a phenomenon where users are creating secondary accounts to be viewable by only select people. Regular Instagram accounts are often curated to be perfect and public facing. Finstagrams provide an outlet where users can just ‘be themselves’. I believe this could be a fascinating study, looking at what causes a user to make a Finstagram account, when did they first start appearing, what are the real world analogues, and so on. Looking at single platforms exclusively should not be dismissed for lack of generalizability for sometimes it is that lack of generalizability that makes findings interesting.

I think there is an interesting future ahead for this combination of Sociology and Computer Science. With everything that is happening with Facebook and the problems that arise from social media in general, I think this field holds the future in figuring out how to help solve the problem of humans in the online world, just like Sociology is trying to solve the problem of humans in the real world. It warrants keeping an eye out and seeing where things go from here.

[1] https://medium.com/bits-pixels/finstagram-the-instagram-revolution-737999d40014

Read More