Reflection #13 – [11/29] – [Mohammad Hashemian]

Data ex machina: Introduction to big data – David Lazer and Jason Radford

 

Although the dynamic features of social media and Big Data provide a great research opportunity for social network researchers to analyze human behaviors, there are many challenges the world envisaged associated with analysis of Big Data. In this study, some Big Data projects considering their Big Data challenges are reviewed.

Big Data in this review paper is broken down into three domains of digital life, digital traces, and digitalized life. What I found out from this categorization is that digitization of human life has made it much easier to analyze her social behavior than before. As it is mentioned in this paper, it is possible for not only social network platforms owners but third parties to harvest data from these platforms. It may be thought that this is always in the interest of users, because researchers can use these valuable information to study human behavior and the result of this research are ultimately useful for the users. But unfortunately, it is not always the case.

Nowadays, one of the most common ways of finding people is through social media sites (digital life), which is a valuable feature for debt collectors because they can use the social networks to find people who owe money. They call this action skip tracing which means tracking down a debtor when there is no information about their current address, phone number, or place of employment. Researchers in these kinds of companies use many state-of-the-art data analytics techniques to obtain valuable information, including freely available information on social media sites, when they want to find someone in order to collect a debt. What was mentioned, was one of the applications of Big Data which is one of the results of digital life and is not accepted by the public. But if I just want to focus on Big Data and its challenges researchers are dealing with, I think some other challenges can be pointed out which are not mentioned in this research.

In the part “The Ideal User Assumption: Bots, Puppets, and Manipulation” authors demonstrate the importance of user’s true identity by focusing only on vulnerability of data because of existence of Bots, Puppets, and Manipulation. However, there is another research challenge in mapping Big Data related to identity of the users.

Many Big Data sources do not contain detailed demographic information unlike traditional data collection methods that result in comprehensive user profiles. Without knowing users’ information, Big Data research may be biased. For instance, the actual users of social media services are generally from younger generation[1]. Thus, data collected from social media represent a small sample of the whole population. It seems more research to understand the user profiles in Big Data sources are needed.[2]

Another Big Data problem is that researchers cannot re-run the published Big Data research. Generally, published scientific research from previous publications can be verified by other researchers by different/same methods with the same data (which is one of the significant features of scientific research). But, many Big Data like social media messages and mobile phone data sets are proprietary. Therefore, researchers unable to re-test the most of the recent published social media research. As an example, consider the Twitter API. Based on the Twitter API agreement, the original raw tweets cannot be distributed by researchers to anyone except for their research groups. Many researchers can only retrieve 1% of randomly sampled data via public APIs for their research. This Big Data problem can be an important obstacle to the advancement of social media research in the future.

 

 

[1] 40% of Twitter users are between the ages of 18 and 29, 25% users are 30-49 years old (https://www.omnicoreagency.com/twitter-statistics/)

[2] https://www.tandfonline.com/doi/full/10.1080/15230406.2015.1059251

Read More

Reflection #12 – [10/23] – Mohammad Hashemian

  1. A 61-million-person experiment in social influence and political mobilization
  2. Experimental evidence of massive-scale emotional contagion through social networks

61 million Facebook users received a special social message at the top of their News Feed, on the day of the Congressional elections, which included polling-place link; “I Voted” button, with a counter showing how many other Facebook users had previously reported voting; and also pictures of the user’s friends who had reported voting. There were also two other groups of users who were randomly selected (about 600000 users): One group received Informational message that was like Social message but with no pictures; the other received no voting message at all.

The results of this study showed that users who received the informational message voted at the same rate as those who saw no message at all. Authors estimated that the social message directly and indirectly increased voter turnout by 60,000 and 280,000 votes respectively (340000 votes in total).

In second research, researchers performed an experiment (manipulating the emotions of users[1]) with Facebook users to test whether emotional contagion occurs between users by reducing the amount of emotional content in the News Feed. They showed that when positive expressions were reduced, the positive posts and more negative posts are produced by users and vice versa. These results indicate that users’ emotions can be influenced by other users on social networks (emotional contagion).

The upcoming political election, is a good example for both research because it can be a potential source for emotional contagion. In these days we can see posts, memes, and many articles in every social network about the negatives/positives of each candidate which can lead to mixed emotions. Advertisements are focusing on the negative behaviors of candidates and also topics like taxes, guns or other high conflict topics which can increase anger, distress and sadness and can be spread unknowingly between people.

The authors of the first research also published another paper[2] recently with the same research topic and approach but this time for 2012 presidential election. Facebook also did another research to see if they could influence voter behavior. Their results showed that Facebook could play a considerable role in influencing how people vote. They actually, changed the News Feeds of 1.9 million users and studied how they behaved, which lead to a 3% increase in the number of people who voted[3]. I think the results of all of these research show the considerable impact of social networks on user’s emotions. I mean these results depict how social networks benefit from the existence of social contagion.

As mentioned, the first research focused more on the effect of social messages in Congressional Elections (the effect of social contagion on voting). I think if we study the social influence in social networks, we should also consider the effect of influencers, especially in important events such as elections. Influencers play critical role in information dissemination during the election days. They are unique and drive huge amounts of engagement, and discussion. Their millions of followers can make or break a campaign with just one video. So they can influence voters’ behavior. For example, in 2017 Iranian presidential election, I remember an influencer only used the color of one of the candidates (each candidate had selected a specific color for her/his campaign) in her profile picture. Then, many people started to use that color in their profile picture and after that this behavior spread between people. Many people even didn’t know who started using this color in profile picture, but because they saw their friends, families or other people were using this color, they wanted to show they are among them too.

[1] https://www.businessinsider.com.au/facebook-study-emotional-states-transfer-2014-6#ixzz3HkSqIikX

[2] Jason J. Jones et al. “Social influence and political mobilization: Further evidence from a randomized experiment in the 2012 U.S. presidential election” April 26, 2017https://doi.org/10.1371/journal.pone.0173851

[3] https://www.businessinsider.com.au/facebooks-news-feed-voting-experiment-2012-2014-10

Read More

Reflection #11 – [10/16] – [Mohammad Hashemian]

  • Danescu-Niculescu-Mizil, C., Sudhof, M., Jurafsky, D., Leskovec, J., & Potts, C. (2013). A computational approach to politeness with application to social factors.

 

In this research a politeness classifier is constructed that performs with a good accuracy (near-human performance). Authors then use this classifier to explore the relation between politeness and social factors in three different categories namely: “relation to social outcome” (they show that later successful candidates are more polite than failed and non-admins), “politeness and power” (they depicts that requests posted by the question-asker are more polite than the other requests) and “prediction-based interactions” (some results like female Wikipedians are more polite).

Although building a politeness classifier seems very simple and it is similar to other text classifications (such as spam-ham classification), as it is mentioned in this paper, it can be useful for exploring politeness levels across several factors which can be an interesting research in social networks. Some applications of this classifier such as analyzing the relation between politeness and social status are mentioned in this paper, however, I am interested in using this classifier to predict the social status.

As in subsection 5.1 it is concluded, polite Wikipedia editors are more likely to reach high status through elections. I think some other features such as the number of edits that a candidate has done so far or even the candidate’s date of registration in the platform can also play significant roles in success of a candidate. But how is the significance level of the politeness factor on the success of the later successful candidates in the election? I am also curious to know if we can model Wikipedia admin elections (or other elections like this) with several features including politeness levels of the candidates. Can we use the classifier that introduced in this research to make a politeness score for each candidate and then employ it as a feature for prediction of Wikipedia administrator elections?

Also, in this research Amazon Mechanical Turk is used to label a portion of the data which come from the Wikipedia community of editors and the Stack Exchange question-answer community. They only employ annotators who are the U.S. residents (and also some other filtering). However, is this a good approach for labeling the request data? What about other countries’ residents? There are differences in politeness judgments between the two groups of users from two different cultures even with a same languages[1]. How can we use this labeling which is only based on the U.S residents judgment, in Wikipedia admin elections where many editors/users are active and using English language in there but are from other countries with different perceptions of politeness? This also makes me curious about another question. This kind of research are studying the politeness of users but what if we study the perceptions of politeness in social media?

I remember when I came to the U.S. had some difficulties in sending Email to professors or even other students. Although before coming to the U.S I had sent/received many Emails to/from professors, I realized that the politeness in this country is completely different only after I arrived here. In my country, we call each other by last name + Mrs/Ms/Miss/Mr in Emails. Using a lot of gratitude is also very common in our Emails. This way of writing Email sometimes even looks weird in the U.S., but for the people in my country the type of writing that is prevalent here seems impolite or even rude. In total, it seems a behavior study about the perceptions of politeness in social media can be useful. The results of this kind of research can prevent some misunderstandings and their consequences (such as some conflicts happen every day in social networks platforms).

 

[1] Yu-Cheng Lee, ‘Cultural Expectations and Perceptions of Politeness: The “Rude Chinese”?, DOI:10.5539/ass.v7n10p11

Read More

Reflection #7 – [09/18] – [Mohammad Hashemian]

  • Social Translucence: An Approach to Designing Systems that Support Social Processes
  • The Chat Circles Series: Explorations in Designing Abstract Graphical Comm. Interfaces

The first paper focuses on designing digital systems based on the human-human communication in real-world. Authors state that these kind of digital systems for communication and collaboration between users can be created by allowing users to observe each other’s activities. They explain about three principals for social translucent systems which are Visibility, Awareness and Accountability by providing an example (the door with the glass window) and then discuss their chat program (a social translucence system) called Babble.

The role of Awareness and Accountability in a social translucent systems is undeniable but I see them as two effects of a cause, Visibility. I think we can consider the Visibility as a significant principal of translucent systems. By making users’ activities visible to one another we inject Awareness and Accountability properties into our system.

Also, I have been thinking about Identity as a principal in designing a translucent systems. In the third part of the example provided by the authors in this paper, where they are explaining Accountability, they say:

“Suppose that I do not care whether I hurt others: nevertheless, I will open the door slowly because I know that you know that I know you are there, and therefore I will be held accountable for my actions”

Which shows the importance of the Accountability. However, in social systems if people don’t know each other (Anonymity), most likely they don’t care whether their actions hurt others or not. So, in an anonymous system, considering the above example, although I know that you know that I know you are there, but because you don’t know who I am, so it is likely that I will do what I want. Thus, in my opinion the Identity’s role should not be neglected in social translucent systems. Anyways, do today’s social translucent systems follow these principles?

Although almost two decades have passed since such papers that were the basis of social networking sites formation published, the spirit of social translucence has remained unchanged. Because we are still using social cues for making our decisions and social translucence systems facilitate it.  However, authors in this paper talk about similarities between digital and physical spaces. According to their program, Babble, like other early social systems, you are facing a digital room like a physical room. Their program shows users’ presence and their activities. You can see these properties in Chat Circles too. Of course the space of the Chat Circles looks bigger and more flexible but the spirit of both programs seems the same.  In today’s social networks like Facebook, users’ presence are not depicted. On Tweeter, users follow each other based on interesting things that users share. So, the space is not related to the social translucence principals anymore. In my opinion, considering a spatial scope as a dimension for a social network is not a good idea. What if many users want to join a chat room? Imagine 68 million active users in Tweeter want to join a chat room. Even joining much smaller number of users in a chat room can also be unthinkable. Of course, I found the Chat Circles idea interesting though as I said this program also suffers from space problem.

The Chat Circles, is an abstract graphical interface for synchronous conversation. Different from Babble and its social proxy, here presence and activity are shown by changes in color and form, proximity-based filtering intuitively breaks large groups into conversational clusters. I think, if they could add another dimension (3D instead of 2D) to their program, it would be more interesting. They also introduced three new elements in Chat Circles II: images in the background, action traces and a map of the entire space. They claim that the background images can introduce a topic for conversation. This is a good idea for the technology of that time, but I think the online topic modeling for each group can be useful here.

Read More

Reflection #3 – [9/4] – [Mohammad Hashemian]

Mitra, T., Wright, G. P., & Gilbert, E. (2017, February). A parsimonious language model of social media credibility across disparate events. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (pp. 126-145). ACM.

 

Today, people can easily share literally everything they want through social networks and a huge amount of data is produced by them every day. But how can we distinguish true information from rumors in social networks? To address the information credibility problem in social networks, the authors in this paper, built a model to predict perceived credibility from language using a corpus of Twitter messages called CREDBANK.

As the authors mentioned, three parts have been defined for the information credibility that are message credibility, media credibility and source credibility. The authors state that their proposed credibility assessment focuses more on the information quality and they did not consider source credibility. Referring to several studies, they express their reasons for emphasizing on linguistic markers instead of source of information. But some questions came to my mind when I read their reasons. They quoted from Sundar that “it is next to impossible for an average Internet user to have a well-defined sense of the credibility of various sources and message categories on the Web…”. I have no doubt about what Sunder mentioned, but if it is not possible for an average Internet user to evaluate the credibility of various sources, is it possible for that user to evaluate the content to assess credibility of the information? In my opinion, credibility of sources in social networks can be much easier than credibility of the content for an average Internet user.

Users usually trust a social media user who is more popular. For example, the more followers you have (popularity), the more trustable you are. To measure the popularity or in other words credibility of a user, there are several other approaches such as the number of retweets and mentions in Tweeter or the number of viewers and likes in Facebook (and YouTube). I agree with Sundar when he talks about multiple layers of source in social networks, but I think most of the time popular users share reliable information. So, even if, for example, a tweet has been handed several times, it’s possible to assess the credibility of the tweet by evaluating the users who have retweeted that tweet.

I have been also thinking about using these approaches to spot fake reviews. The existence of fake reviews even in Amazon or Yelp is undeniable. Although Amazon repeatedly claims that more than 99 percent of its users’ reviews are real (written by real users) but several reliable researches show something else.

There are many websites in the Internet where sellers are looking for shoppers to give positive feedback in exchange for money or other compensation. The existence of the paid reviews has made customers suspicious about the credibility of the reviews. One approach to spot the fake reviews is evaluating the credibility of the sources (reviewers). Does a given reviewer leave only positive reviews? Do they tend to focus on products from unknown companies? Ranking users on their credibility can be considered as a solution to evaluate the credibility of the reviews. Amazon has done this approach by awarding badges to customers based on the type of their contributions in Amazon.com such as sharing reviews frequently. However, it seems that their solutions have not been sufficient.

I think employing the same approach demonstrated in this paper, to spot the fake reviews can be useful. I still believe that source credibility has a very important role in the information credibility, however, can we have a better evaluation of the information by combining these two approaches together?

Read More