Reflection #1 – [1/29] – Bright Zheng

Journalists and Twitter: A Muti-dimensional Quantitative Description of Usage Patterns – Mossaab Bagdouri

Summary

Twitter is a great platform for journalists and news organizations to reach out and interact with their audiences. In this paper, Bagdouri studies the interaction of journalists / news organizations and their audience by surveying 18 features of three different account categories (journalists, news organizations, and news consumers),  across two languages and cultural backgrounds (English-speaking and Arab-speaking countries), and three types of news media (print, radio, and television). By performing Welch’s t-test and Kolmogrov-Smirnov test with these features and different categories of Twitter accounts, Bagdouri found that journalists tend to target and have personal engagements with their audience, whereas news organizations prefer broadcasting their posts and are more official. The same pattern is found when comparing Arab journalists and English journalists, where Arab journalists appear to broadcast more tweets and more distinguishable from their audience than English journalists. The paper also finds that journalists across different media types are very different. 

Reflection

This is an overall very interesting paper. I really like how Bagdouri compares journalists from different cultural backgrounds and how he compares journalists/news organizations with their audiences. These comparisons are probably not done in previous work, since this research surveys the largest set of twitter accounts and tweets and is more focused on journalists. However, the paper’s definition for “audience” intrigues me. “Audience” in this paper are accounts that “have a bidirectional follower / friend relationship” with selected journalists. The limitation that the paper addresses is that some of these audience accounts might include other journalists, but since the number of journalists is statistically insignificant compare to the total number, this isn’t really a limitation. This definition of “audience” omits twitter accounts that follow these journalists and don’t have a bidirectional follower relationship, and these twitter accounts are true audience of these journalists in my opinion. People that journalists follow may include other journalists and their friends in real life, and they are not representative of what the actual audience perception and reaction are like. I think it would be better to survey all the followers, instead of just the ones with bidirectional relationship with the journalists.

Another thing that intrigues me is that the journalists are not separated into news categories. Sports journalists and news organizations might tweet differently and have different interactions with their audience than the ones that cover politics. Audience might also react to different categories of news differently. For example, I often see sports fans tweeting out their excitement or disappointment with original tweets, which means less interaction with the journalists, and people react to political figures’ tweets or news by retweeting the original tweet extensively. Analyzing different news categories of journalists and news organizations can definitely be developed as part of the future work of this research.

This research can also consider a third type of users, Opinion-ist (writers who write Opinions). Opinion-ist are not journalists, but they still talk about news and are often quite influential. It would be interesting to compare journalists and opinion-ist to see how similar they are and how personal/targeting they are, since the research already showed that journalists are more targeting and relatable than news organization accounts. 

Since this research can be seen as an extension of De Choudhury, Diakopoulos, and Naaman (2012)’s classifier research, an account classifier that’s based on Bagdouri’s data would be interesting. This classifier will make the verification process easier and suggest more targeted accounts to new users, and it should be able to at least identify whether an account belongs to a journalist, a news organization, or a news consumer, and what language they primarily speak. If more analysis is done with different news categories, the classifier should also be able to categorize different accounts.

Read More

Reflection #1 – [01/29] – [Liz Dao]

Mossaab Bagdouri. “Journalists and Twitter: A Multidimensional Quantitative Description of Usage Patterns”

Summary:

The last couple years witness a major shift in people’s sources of news. Due to its convenient, instant and interactive nature; Twitter is arising as a major news platform, especially among the younger generations. As a result, there is a demand for understanding the behaviors and strategies of journalists on Twitter. This research aims to answers several questions:

  1. Do journalists build more personal engagement with their audience than organization broadcast
  2. Can English journalists’ behaviors be generalized to a broader population with a different cultural, linguistic, and regional background?
  3. Can new consumers be distinguished from journalists by their behavior on Twitter?
  4. Do journalists working for different types of news outlet act differently?
  5. Do journalists speaking the same language but live in different regions share similar behaviors on Twitter?

Thirteen million tweets of 5,358 Twitter accounts of journalists and news organizations and two billion posts from more than one million of their connections are collected. The authors extract eighteen numerical features from these data and conduct Welch and Kolmogorov-Smirnov statistical tests to analyze their distributions across different groups.

The authors discovered a few interesting patterns of behaviors across the groups:

  1. Organization broadcasts use more formal language and share more links than journalists while journalists prefer to build a more engaging, targeted communication with their audiences.
  2. The higher frequency of question mark usage also suggests that journalists use Twitter as a source of information.
  3. Some journalists (Arabic) are more distinguishable from news consumers than others (Arabic).
  4. Journalists’ behaviors differ across media types.

Reflection:

 To begin with, the authors mentioned in the introduction they strike to provide observations that can be generalized to the larger population of journalists rather than a particular group like previous studies. Even though their dataset is massive, the majority of it still belongs to English speaking journalists. Only two tests involve Arabic speaking journalists. Furthermore, they never explain why they choose Arabic instead of other languages and Ireland instead of other British-English speaking regions. Hence, an interesting question is how journalists from other regions act differently to those studied in this research?

 In addition, the journalists’ preference of medium varies significantly across the groups. While journalists use mobile for 54.95% of the time, news organizations use desktop and special Twitter application more often. Meanwhile, Arabic journalists use desktop twice as frequent as news consumers. However, print journalists are not fond of the ideas of posting articles via mobile phones. It might be interesting to investigate which factors affect the journalists’ preference of medium used to publish their tweets. Do younger and on-site journalists tend to post via mobile devices? What type of posts are mostly posted via mobile device: breaking news, discussion, questions, etc.? Is there a correlation between the validity of the account and its medium preference? As posting via mobile devices is more convenient and instant, it might suggest a lack of consideration and time investment in the tweets. A perfect example is the President; most of his tweets are posted from his phones. However, fake news is possibly generated and posted by an algorithm thus tweets posted via desktop might have a higher chance of being fake news.

Despite the fact that the behavior of Arabic and English speaking journalists are mostly similar, the audiences’ reactions to their tweets diverge significantly. Arabic journalists’ tweets receive much more retweets and favorites compared to that of news consumers. On the other hand, English journalists receive fewer reactions for their tweets than other news consumers’. What are the causes of this pattern? Can this be because there are more internet figures (celebrities, vloggers, etc.) in the English speaking region than that of the Arabic world?

Finally, one of the biggest pitfalls of this research is that it detects answer-seeking question tweets based solely on the existence of question marks. Even though the authors recognize the potential of misclassification, they decide to go with the simple, naïve approach without providing a justification. Since the usage of rhetorical questions as click baits has been emerging in recent years, the rate of false negative might be quite high, especially with less formal tweets by journalists and news consumers. Therefore, it possibly will provide a more accurate insight if future research can use natural language processing techniques to classify whether a tweet is an answer-seeking question. Moreover, it might be interesting to see if there is a correlation between the use of rhetorical questions and the validity of the news and the reaction of the audience?

Read More

Reading Reflection #1 – [1/28/2019] – [Tucker,Crull]

Journalists and Twitter: A Multidimensional Quantitative Description of Usage Patterns

Summary:

This is a paper that talk about their study on twitter in the news. They did extension research into the Twitter accounts held by journalists, news organizations, and consumers. Using the data from the accounts they were able to use Welch and Kolmogorov-Smirnov tests to show statistical differences in eight comparisons across two regions, Arab world and European world. They found that journalists use a direct form of communication design to engage their audience and use twitter as a source of information. On the other hand, news outlets use a professional style in their tweets and are more likely to use links in their post. They also they found that Arab journalist are more distinguishable from news consumers than English ones.

Reflection:

  • Quote: “Journalists and organizations also differ in the medium used to publish their tweets. In fact, while they both use a desktop in about 30% of the time, mobile is the preferred medium for journalists (54.95%), and organizations tend to use special Twitter applications for posting more than 28% of the tweets”
  • Honestly, this makes totally sense to me because I think that most of the Journalists in this study are younger Journalists who would be more incline to be on their phone all the time. I know that my generation is always on our phones and it would make sense that Journalists who are in my generation or the one before would also be on their phones all the time. I would love to see an age breakdown of the journalists in this study.
  • Overall, I didn’t find the results of the study suspiring because Bagdouri’s results says that news organizations are less likely to retweet posts or engage with users. This makes sense because when the news organizations accounts post they are posting for the whole company and thus can’t risk endorsing a tweet that would bring the hate from their twitter community. Whereas, a journalist is only speaking for themselves so they can post more of what they believe in.

Additional Questions:

  • How does the “fake news” change the way we view tweets from journalists and news organizations?
  • How does journalists or news organizations political leaning affect how the public looks and engages with the tweets?

Read More

Reflection #13 – [11/29] – [Karim Youssef]

In the last decade, the evolution of digital systems and connectivity has created a gushing stream of data that contains latent treasures. Multiple domains such as biology, astronomy, and social science are facing an unprecedented challenge; how to deal with Big Data? Big data is perhaps one of the most resonant scientific terms in the last decade. A specific field of study in computer science has been established to develop hardware and software systems that scale with big data.

In social science, big data gushing out of online social platforms represents an invaluable gold mine. In their work titled Data ex Machina: Introduction to Big Data, David Lazer et al. explore the opportunities and challenges of using big data to study and analyize different social phenomena. From their perspective, big social data comes from three sources; digital life ( e.g. online social platforms ), digital traces ( e.g. call records ), and digitized life ( e.g. digitizing old books and newspapers ).

Many opportunities exist in analyzing data that are generated from the aformentioned sources. These data could reflect actual patterns of social activities that are hard to extract from research surveys and questionnaires. It also creates the opportunity to analyze social interactions with breaking events as they are happening, rather than performing a retrospective analysis on past events.

From my perspective, archived data from social platforms could also serve as a treasure to reptrospectively analyze some special events. An example that always influences my ideas is the Arab Spring uprisings. For example, the Egyptian people have lived a very precious experience between 2011 and 2013. In some places in the world, online social platforms could be the best places to record the traces of some events.

Online social platforms could also serve as a natrual experiment field, a great opportunity but with great ethical concerns. An example could be Facebook’s experiment on social influence and emotion contagion, a very promising experiment that raised a huge ethical debate.

David Lazer et al. also explore a set of challenges and vulnerabilities. Those challenges include the generalizability of analysis performed on an individual data source ( e.g. Twitter ). From this point, they shed the light on the problem of social activities of individuals are spread across multiple social platforms. They also talk about the the credibility and legitimacy of data given the widespread of bots and fake indentities.

From my view, I believe that no matter how hard are the aformentioned challenges, the continuous research efr fort to understand and solve these problems will likely converge at some point. The harder challenge that is likely to linger is the ethical challenge. David Lazer et al. shed the light on the research ethics. How to guarantee the rights of a human subject research, have an informed concent in place, and still guarantee the quality and benefits of collecting and analyzing his online social data. Some people commented on social platforms being free by saying “if it is offered to you for free, then probably you are the goods being sold“. From my perspective, establishing and applying rules that preserve the rights of every user, and being trasnparent about everything, is a lingering challenge.

Read More

Reflection #13 – [11/29] – [Eslam Hussein]

David Lazer and Jason Radford, “Data ex Machina: Introduction to Big Data”

Summary:

The  authors in this paper focused on explaining big data for the sociology community and the opportunities they offer to them. They classify big data into three categories based on how social data digitized which are digital life, digital traces and digitalized life. They also explain related problems and issues such as 1) Generalizability, when researchers conclude generalized findings without paying attention to how the data is collected and that each platform has its own census and definitions of social phenomenon such as friendship. 2) Too many Big Data, where the data needed to study a social behavior exist in different platforms that studying a dataset from one of them is not sufficient. 3) Artifacts and Reactivity, where Big data is not to be trusted for the various anomalies and errors due to technical changes. 4) Assuming the human/user being studied true actual and authentic user “they call them ideal users”, while in the digital world this not true due to false users “bots and puppets” and the user data being manipulated.

Reflection:

  • I like the suggestion proposed by the authors to solve the generalization issue by combining data from different sources. That makes me think about studying phenomenon on different social platforms and how close and relate they do to each other, for example, studying anti-social behaviors on different social platforms, do users behave similarly? do they use the same language or each platform has its own? if we studied bot/puppets on different systems, do they have similar characteristics so that we can generalize their models to other systems. I think those are interesting studies to conduct.
  • The authors assumed that big data gathered from digital social world will represent and facilitate the studies of social phenomenon. This might be true for some phenomenon but untrue for others since people do not behave the same in both worlds. Digital/virtual worlds protect their residents from many consequences that might happen in the real world where people are more conservative, discrete and insecure. Also those virtual worlds promote new behaviors and create new phenomenons that would not exist in the real social life such as anonymity, bots and puppetsThat is why big data offer more opportunities and challenges to social scientists than those data gathered from real world using conventional methods such as field studies and surveys.
  • Another issue came into my mind when studying big data for social purposes is what is the best data format to represent social data? is it in the traditional tabular format (spreadsheets and relational databases), or as graphs (graph databases and format used by social network analysis tools) or in raw format files (images, text, video)? I think how we represent our data would definitely help and facilitate lots of tasks in our studies.

 

 

Read More

Reflection 13 [11/29] Neelma Bhatti

Reference:
Lazer, D., & Radford, J. (2017). Data ex machina: Introduction to big data. Annual Review of Sociology, 43, 19-39.
Reflection:
This article reminded me of two things, why I love sociology/psychology and why I like literature surveys. Literature surveys, Sociology and Big data seemed like a good amalgamation of things which reveal interesting findings about literature,  groups of people and  patterns respectively.
The article kind of puts the semester long class of social computing in one place by discussing Big data, social systems, bots and sock puppets creating the illusion of perfect user and social contagion, to name a few. The author emphasizes the strength of incorporating sociological studies and computer science to make the best use of Big Data. He explores the opportunities and threats and also provides suggestions for addressing future challenges with big data research.
The author did a great job of summarizing almost every aspect of the opportunities and vulnerabilities associated with Big Data . The role of big data in transforming learning and higher education seemed to be missing though. Big data its still a niche topic in the field of education, but governments have started to produce reports about the potential of big data in education [1]
Although at a glance, most of the studies designed to gather or examine big data to extract useful pattern look intrusive (like the behavioral involving college students who were provided with cell phones for the investigation of user data), I believe their is a need to educate people about the greater good of collaborating by providing passive, non-sensitive data for scientific and behavioral studies. Because we are generally skeptical about possible privacy invasions through our phones and social media accounts, but also want to remain foremost at the receiving end of scientific advancements.
 Just as in an emergency people are willing to do monetary help, they should also be educated to contribute by providing data to investigate the situation. This will greatly reduce the vulnerabilities associated with big data such as errors and misinterpretations of data due to self-reporting.
[1]Eynon, R. (2013). The rise of Big Data: what does it mean for education, technology, and media research?.

Read More

Reflection #13 – [11/29] – [Prerna Juneja]

Paper: Data ex Machina: Introduction to Big Data

The main theme of the paper is to critically examine the potential of big data in the field of sociology. The authors start by defining big data and its types (digital life, digital traces, digitalized life). They have reviewed several big data projects and research papers to highlight the opportunities and the challenges of using big data in the field of computational social science. They conclude by discussing six future trends that will affect the use of big data in future.

The paper for this week is a perfect way to conclude the course. It summarizes almost everything that we have learnt so far in the semester. The need for the social computing field is described perfectly in the paper in the line “Big data thus requires a computational social science—a mash-up of computer science for inventing new tools to deal with complex data, and social science, because the substantive substrate of the data is the collective behavior of humans (Lazer et al. 2009).” [Lazer et al]

The authors talk about biases in the self-reported behavior. Qualitative analysis is an important part of this research field that sometimes heavily relies on surveys and interviews. Thus, understanding and reducing biases from surveys and interviews is very important.

Now casting was a new term for me. It was interesting to see the impact of projects like “Billion Prices Project” [how Argentina stopped publishing inflation numbers and used this project to infer inflation]

The authors also reviewed projects where researchers have studied underrepresented population eg. people suffering from depression and having suicidal ideation. But not every population is represented well in all kinds of the online datasets- internet access is still limited in developing countries.

The authors talk about the core issues of big data, prominent issue being that scale of data can lead to the illusion that it contains all the relevant information about all kinds of people. So it’s important to understand what your data is. But how much data is needed to make “general claims” is a question no one has answer to.

A line in the paper “Twitter has become to social media scholars what the fruit fly is to biologists—a model organism.” indicates overuse of Twitter data for research due to its easy availability. The author argues that relying on a single platform can produce issues for generalizability.

The authors in the end discuss future trends, how data is only going to increase in the future coz of several digitization initiatives (almost everything is moving online, paper records are diminishing). Popularity of text based platforms is decreasing and platforms like snapchat and Instagram are rising in popularity. It seems in future, the bulk of data will consist of images and videos. It will be interesting to see different fields (computer vision + data analytics + sociology) coming together to analyse this data.

 

 

Read More

Reflection #13 – [11/29] – [Deepika Rama Subramanian]

This article, surprisingly originating from a sociology journal, seems to succinctly talk about everything we spoke about this semester in our class. They speak about big data’s enormous use in understanding human behaviour. Especially behaviours/behaviour patterns that are difficult to record or are often misreported in self-reports.

As is fit by a sociology journal, they cast the world of big data into various manifestations – digital life, digital traces, digitalized life and instrumentation of human behaviour. When the authors say that digital life can be viewed as generalizable microcosms of society, I wonder if this is appropriate. Given the fluid and incremental way in which platforms grow, does it seem appropriate to give and keep a label?

As we’ve seen several times before in this class, one worries about the ethical conundrum in mining big data to gauge insights in sociology. While the tracking of phones of several students to understand the ties between friends (from Facebook), the students were aware of this. We previously read a paper about Facebook (without the knowledge of its users) tweaking its feed to gain knowledge about emotional reactions to the posts on their feed.

We are also surrounding ourselves with gadgets and objects that are constantly providing information that can be exploited possibly for things we would not consent to. In a sense, we are creating our own surveillance state. Dr. Michael Nelson, during a recent seminar here, spoke about this very problem. By using fitness tracking devices, virtual assistants, we are unknowingly and unwittingly providing valuable data for analysis. The fitness app Strava faced some flak for this when US soldiers in Afghanistan used the app for tracking while running around army bases there. The soldiers unbeknownst to themselves were clearly giving away locations of secret army bases. Obviously, we are still having some trouble keeping all the data being misused.

Another issue with big data that was mentioned by the authors that seems prevalent is the issue of inclusivity. During the talk given by Dr. Rajan Vaish, I remember his mentioning that even during their study (Whodunit?), they found it difficult to involve the rural population in India. Ofcourse this is in part due to the expensive, metered internet connections but this is also a question of interest (in case of other major online platforms) for the community.

The authors themselves have outlined several other issues that come with big data analytics in sociology. One of the more familiar ones is the issue of bots, puppets and manipulation. We have, over the course of this course, seen several ways we can curb this behaviour making the data that is available to us more meaningful. But this problem is present for now skewing a lot of the ongoing analyses.

Finally the authors talk about qualitative approaches to big data and I almost laughed out loud! We were battling this issue with a much much much smaller data set until even a few days ago. The authors promise computationally enabled qualitative analyses that will help us analyse data that is beyond the capacity of armies of grad students to read. To this we say thank you!

This article was a fitting way to end the semester and the course. It was in itself a summary of sorts making it slightly difficult to summarise and reflect on.

Read More

Reflection #13 – [11/29] – [Mohammad Hashemian]

Data ex machina: Introduction to big data – David Lazer and Jason Radford

 

Although the dynamic features of social media and Big Data provide a great research opportunity for social network researchers to analyze human behaviors, there are many challenges the world envisaged associated with analysis of Big Data. In this study, some Big Data projects considering their Big Data challenges are reviewed.

Big Data in this review paper is broken down into three domains of digital life, digital traces, and digitalized life. What I found out from this categorization is that digitization of human life has made it much easier to analyze her social behavior than before. As it is mentioned in this paper, it is possible for not only social network platforms owners but third parties to harvest data from these platforms. It may be thought that this is always in the interest of users, because researchers can use these valuable information to study human behavior and the result of this research are ultimately useful for the users. But unfortunately, it is not always the case.

Nowadays, one of the most common ways of finding people is through social media sites (digital life), which is a valuable feature for debt collectors because they can use the social networks to find people who owe money. They call this action skip tracing which means tracking down a debtor when there is no information about their current address, phone number, or place of employment. Researchers in these kinds of companies use many state-of-the-art data analytics techniques to obtain valuable information, including freely available information on social media sites, when they want to find someone in order to collect a debt. What was mentioned, was one of the applications of Big Data which is one of the results of digital life and is not accepted by the public. But if I just want to focus on Big Data and its challenges researchers are dealing with, I think some other challenges can be pointed out which are not mentioned in this research.

In the part “The Ideal User Assumption: Bots, Puppets, and Manipulation” authors demonstrate the importance of user’s true identity by focusing only on vulnerability of data because of existence of Bots, Puppets, and Manipulation. However, there is another research challenge in mapping Big Data related to identity of the users.

Many Big Data sources do not contain detailed demographic information unlike traditional data collection methods that result in comprehensive user profiles. Without knowing users’ information, Big Data research may be biased. For instance, the actual users of social media services are generally from younger generation[1]. Thus, data collected from social media represent a small sample of the whole population. It seems more research to understand the user profiles in Big Data sources are needed.[2]

Another Big Data problem is that researchers cannot re-run the published Big Data research. Generally, published scientific research from previous publications can be verified by other researchers by different/same methods with the same data (which is one of the significant features of scientific research). But, many Big Data like social media messages and mobile phone data sets are proprietary. Therefore, researchers unable to re-test the most of the recent published social media research. As an example, consider the Twitter API. Based on the Twitter API agreement, the original raw tweets cannot be distributed by researchers to anyone except for their research groups. Many researchers can only retrieve 1% of randomly sampled data via public APIs for their research. This Big Data problem can be an important obstacle to the advancement of social media research in the future.

 

 

[1] 40% of Twitter users are between the ages of 18 and 29, 25% users are 30-49 years old (https://www.omnicoreagency.com/twitter-statistics/)

[2] https://www.tandfonline.com/doi/full/10.1080/15230406.2015.1059251

Read More

Reflection #13 – [11/29] – [Dhruva Sahasrabudhe]

Paper-

Data ex machina: Introduction to big data – Lazer et. al.

Summary-

This article was a survey article, talking about the potential of big data in sociology, and computational social science. It served as a very good survey article, providing many examples and references of interesting research being done, which leverages data to glean insight in social science, and talks about the different interpretations of data, depending on the knowledge we wish to gain. The article began with a statement saying that what is captured from data isn’t what the social scientist wants, which makes how to mine what we want from the dump of interactions a difficult task in itself, which is an important theme in the article.

Reflection-

I liked the use of the phrase “substantive substrate of the data is the collective behavior of humans” used to describe the applicability of data to the social sciences, as it paints a good picture of the task of distilling understanding of human behavior from a lot of interactions.

I also found the two interpretations of social media platforms as either microcosms of all of society, versus a realm in itself, interesting. The second interpretation holds that not only are these platforms incomplete at capturing all human experience, but they also modify human behavior in their own right.   

The tools and results obtained by The Copenhagen Network Study were interesting because they tried to obtain meaningful data about interaction from diverse sources, i.e. mobile phone exchanges and facebook, and found that participants were actually using those two communication media for different tasks, to interact with largely distinct groups of friends, reinforcing the second interpretation of social/communications platforms (from the former paragraph).

Another fascinating insight I got from this article was on how big data can be used to cheaply analyze and interpret politically relevant information on a national level, e.g. the research on predicting inflation rates using goods prices, or the research on estimating the impact of a hurricane.

Making big data small, i.e. identifying subgroups of interest within the dataset, like the leaders of a revolution, people with PTSD/other psychological disorders, etc. can be used to study these phenomena retrospectively in an unobtrusive manner.

I also found the term “big data hubris” used in the article interesting, since it helped me understand that volume of data can be misleading if sampling is not done properly, or if you do not understand the data you have. For example, the spiked trends in usage of the word “fuck” in books in the 1800s, in Google Ngrams was found to be due to a failure of OCR systems to read archaic spellings of the letter “s”. The presence of a large number of fake accounts and bots on certain platforms also makes it important to ensure that the data obtained is genuine.

This article was a thought-provoking and fascinating read. It was a wonderful way to conclude the reflections we did in this course, as it gave a high level, but broad insight into research areas in the field of social computing.

Read More