Reflection #12 – [10/23] – [Nitin Nair]

[1] R. M. Bond et al., “A 61-million-person experiment in social influence and political mobilization,” Nature, vol. 489, no. 7415, pp. 295–298, 2012.

[2] J. E. Guillory et al., “Editorial Expression of Concern: Experimental evidence of massive scale emotional contagion through social networks,” Proc. Natl. Acad. Sci., vol. 111, no. 29, pp. 10779–10779, 2014.

<Jots>

<1>

Ethical issues abound.

<2>

When one observes (key word) a close friend to be in a negative mood, we internally pick one of two choices: fight or flight (broader term: arousal).

Given, if the connection is strong, the probability of choosing the former, fight, will be high.

Although, the probability is dependent on the psychological character of the actor, given that most of us belong to one that exhibits positive attitude towards helping people fall into a category that would “fight.”(assumption may be challenged)

Given, the above, our minds go into a state which is ideal from helping another combat certain issue. This state allows one to put oneself in another’s shoe: empathy.

Can being in this state reflect in oneself?

If so, isn’t it a good characteristic, one that makes us human?

<3>

But how does one separate empathy for a close few from a group.

Can this empathy turn into allegiance leading to group thinking?

Is censorship a good way to tackle this? As a proponent of democracy I don’t think so.

Such effects can be negated through more diverse opinions in the public forum. But isn’t this affected by entities that restrict the access to these channels.

<4>

Also, doesn’t it mean there is a lack of forums which are designed to be available for everyone. Designed to be fair. This is definitely a moonshot as “human forum” (society) evolving for millennia hasn’t found the global optima (solution). But, this shouldn’t be a deterrent.

<5>

If one is pushed into a negative mood valance, would you be able to understand another’s state? Study [3] shows we’re less able to resonate with other people’s pain when we’re feeling down. Wouldn’t this mean the assumption of the authors of [2], of having equal probability of sharing information by a person being subjected to the experiment is wrong?

 

[3] Li, X., Meng, X., Li, H., Yang, J. and Yuan, J., 2017. The impact of mood on empathy for pain: Evidence from an EEG study. Psychophysiology, 54(9), pp.1311-1322.

Read More

Reflection #11 – [10/16] – [Nitin Nair]

  • Danescu-Niculescu-Mizil, C., Sudhof, M., Jurafsky, D., Leskovec, J., & Potts, C. (2013). A computational approach to politeness with application to social factors.

Human speech is a interesting in a sense that, not only is it used to convey information from one person to another but it contains other information that helps us understand dynamics between people and so forth. Such encodings are implicitly imbibed through social behaviors that one is surrounded by.  [1]

In this paper, the author understands one of these encodings, politiness through computational models. In order to build said computational model the author first makes a dataset from two sources wikipedia (feature development) and stackoverflow (feature transfer). The annotations are done by Amozon Mechanical Turkers. A classifier is then built which uses domain independent lexical and syntactic features extracted from the above above dataset to achieve near human levels of accuracy on classifying a text as polite or impolite. This model is then used to identify/reaffirm the following:

  • Politeness decreases as one moves the social power dynamic food chain
  • Politeness variation by gender and community

One of the interesting points that I thought one could work on after reading the paper, is to see how a language could affect the measure of politeness? One could see how different cultures which therein affecting the language could have an impact. This could depend upon the structure of the society and so forth. [2] [3]

One could also see how if the model described above used in domains where the vocabulary is different let’s say a website where the internet slang is rampant like 4chan would fail to work.

The above two instances can be looked as how politeness understood from one domain could not work in another where the underlying factors that determine politiness is different.

Given the dataset and its content, I belive what the author has built a computational model for is likability rather than it’s superset, politeness which is more encomposing than the former.

The author has used SVM to learn from features extracted separately from sociolinguistic theories of politeness. One could see how a newer deep learning based method could be used to extract these features to jointly learn the phenomenon of politeness [4].

 

[1] https://hbr.org/1995/09/the-power-of-talk-who-gets-heard-and-why

[2] Haugh, Michael. “Revisiting the conceptualisation of politeness in English and Japanese.” Multilingua 23.1/2 (2004): 85-110.

[3] Nelson, Emiko Tajikara. “The expression of politeness in Japan: intercultural implications for Americans.” (1987).

[4] Aubakirova, Malika, and Mohit Bansal. “Interpreting neural networks to improve politeness comprehension.” arXiv preprint arXiv:1610.02683 (2016).

Read More

Reflection 10 – [10/02] – [Nitin Nair]

  • Kate Starbird, Examining the Alternative Media Ecosystem through the Production of Alternative Narratives of Mass Shooting Events on Twitter

In this paper, the author analyzes twitter data from a 10-month period to create what they term as “domain network” which is studied by qualitative analysis to explore a subset of “fake news.” The paper through its qualitative analysis finds different groups that propagate this fake news. These groups as shown by the author does not fit the general left-right political spectrum due to overarching commonalities between the same.

Due to the shift from traditional methods of propagation of news which although had its demerits has put the burden of wading through lot of information and its validity on to the people who aren’t trained to do so. This has exacerbated the issue of “fake news” and has created a market for alternative news which is not a new phenomenon. I believe the way to tackle this issue is through stemming the funding of these sources. Censoring by the means of deletion, given that we live in places where right to speech exists, impinges on the rights of these rumor mongers but stemming the funding could ride the thin line and achieve the result we want. Creating necessary barriers like temporary auto demonetization of content if “clickbait” titles are found along with explicit content could be a way forward. But, systems should be in place to appeal for reversing this stance by genuine members which could be considered by a case of case basis by moderators. The moderator logs could then be put out in public along with the content for increasing transparency. Putting such burden might hurt genuine content makers but a well thought out design to do the same as discussed, I believe, is a right way forward.

Believing in one conspiracy theory makes a one more likely to believe another as shown in [2], is an interesting one. This issue I believe is exacerbated by the selective exposure due to the “filter bubbles” created by these news sources. Creating mechanisms through which users can select or populate feed with views different from its own could ameliorate the spread of alternative narratives. It’s necessary to understand that one can only control the spread and not eliminate such purveyors of such narratives. Such narratives if in the right amount could be part of the entertainment part of our information diet.

Another design element which could be considered is showing a report of “information diversity” of people at regular intervals. This could be effective nudging to promote diversifying information diet.

But how does one create barriers for such content especially in places where moderation by an informed and unbiased party is not possible. IM services are a good example of such grounds. I do not have a solution for spread of misinformation through such channels but given its impact is one where focus has to be given.

Creating an alternative business model although a hard task to perform I believe an essential step to move forward. Any of the above strategy discussed above which might prove effective in the short term are stop gap mechanisms. To put things in perspective, the CPM model we follow at the moment was introduced by the team at Hotwired in October 1994 when the number of people on the internet was less than 0.4% of earth’s population compared to more than 50% right now.

 

 

[2] Van Prooijen, J. W., & Acker, M. 2015. The influence of control on belief in conspiracy theories: Conceptual and applied extensions. Applied Cognitive Psychology, 29(5), 753-761

Read More

Video Reflection #9 – [09/27] – [Nitin Nair]

How we humans select information depends on a few known factors which make the information selection process biased. This is a well established phenomenon. Given such cognitive biases exist and we live in a democratic system in an age where information overload exists, how does it impact social conversation and debate?
As mentioned in the talk this selective exposure or judgment can be used for good, for example to increase voter turnout. But this gets me thinking, is this nudging sustainable? Relating to the discussions after reading reflection #1, about different kinds of signals, is this nudge an assessment or conventional signal? One could definitely think about an instance where users exposed to barrage of news instances which bolsters their positions get desensitized, resulting in neglection of these cues.
The portion of the talk where the speaker discusses about behaviour of people when exposed to counter-attitudinal positions is an interesting one. This portion coupled with one of the project ideas proposed by me got me thinking about a particular news feed design.

Given that we solve the issue of mapping the position of different news sources in the political spectrum, in order to expose users to sources from outside their spectrum, we could design a slide bar whose position will decide the news sources which will populate the news feed as shown above. The lightning symbol next to each article allows one to shift to a feed populated by articles talking about the same topic. The topic tags which are found through keyword extraction (Rose et al. (2010)) combined with the time of publishing of the article could help us suggest news articles talking about the same issue from a source with a different political leaning.
Given such a design, we could identify the trends on how and when the users enter the sphere of counter-attitudinal positions which is an idea the speaker mentions in the video.
Do people linger more on comments which go against their beliefs or which suits their beliefs? One could run experiments on consented users to see which comments they spend more time reading. Pick and analyze posts which top the list, accounting for the length of the post. My hypothesis is that comments which go against one’s belief would warrant more time as one would take time to comprehend the position first, compare and contrast with their own belief systems and then take action which can be replying or reacting to the comment. If using temporal information is useful, it could pave way to a potential method through which one can find “top comments”, uncivil comments(more time taken) along with explicit content(less time taken). During the extraction of top comments one has to have a human in the loop along with accounting for the personal political position in the spectrum.
The discussion by the speaker on “priming” of users using the stereotype content model, is extremely fascinating (Fiske et al. (2018)). Given that priming has a significant impact on the way users react to certain information, can it be possible to identify “priming” in news articles or news videos?
One could build an automated tool to do so to detect and identify the kind of priming, may it be “like”, “respect” or other orthogonal dimensional primes. The orthogonal prime could be “recommend” the one the speaker points out in her research (Stroud et al. (2017)). Given such an automated tool exists, it would be interesting to use it on large number of sources to identify these nudges.

 

References

Susan T Fiske, Amy JC Cuddy, Peter Glick, and Jun Xu. 2018. A model of (often mixed) stereotype content: Competence and warmth respectively follow from perceived status and competition (2002). In Social Cognition. Routledge, 171–222.

Stuart Rose, Dave Engel, Nick Cramer, and Wendy Cowley. 2010. Automatic keyword extraction from individual documents. Text Mining: Applications and Theory (2010), 1–20.

Natalie Jomini Stroud, Ashley Muddiman, and Joshua M Scacco. 2017. Like, recommend, or respect? Altering political behavior in news comment sections. New Media & Society 19, 11 (2017), 1727–1743.

Read More

Reflection #8 – [09/25] – [Nitin Nair]

  1. Garett, R. Kelly (2009) – “Echo chambers online? Politically motivated selective exposure among Internet news users
  2. Resnick, Paul (2013) – “Bursting Your (Filter) Bubble: Strategies for Promoting Diverse Exposure”- Proceedings of CSCW ’13 Companion, Feb. 2013.

One of the essential elements of having a democracy is the presence of free press. This right to unrestricted information although with some exceptions, has given us people, the ability to make informed decision. But in recent years, the delivery of such news is through channels which aren’t fair, through the use of personalized recommendation systems. The paper discussed below tries to answer pressing questions from this domain.

In [1], the author tries to look into selective exposure in news readers and tries to see if they are motivated colored by one’s political opinion through the use of a web administered behavior-tracking study. In order to gain better insights the author gives out five hypotheses listed below.

  1. The more opinion-reinforcing information an individual expects a news story to contain, the more likely he or she is to look at it.
  2. The more opinion-reinforcing information a news story contains, the more time an individual will spend viewing it.
  3. The more opinion-challenging information the reader expects a news story to contain, the less likely he or she is to look at it.
  4. The influence of opinion-challenging information on the decision to look at a news story will be smaller than the influence of opinion-reinforcing information.
  5. The more opinion-challenging information a news story contains, the more time the individual will spend viewing it.

Given the how dated the publication is, I wonder if the conclusions of the paper are still relevant. The major channel through which news is delivered has shifted to social media. Here the options are limited given that the content is prefiltered and delivered only if the chance of one clicking it is high. Also, the content you are given exposure to, depends on your “network”. Given these features of the mode of delivery, the authors conclusion, I believe, would definitely be challenged.

Also, one might even question the validity of the authors claim due to lack of diversity of the sample group and how the sample group was selected. Given how the exposure of survey on different news sources were, the group of people who were willing to participate may not have been the true representative of their groups. 

It would have been an interesting experiment if the author chose a wider variety of groups from a diverse political background and analyzed the group behaviour of the same and compared them with each other.

Another experiment that would be have been interesting if conducted would be to see how the behaviour of the user group changes when reading about a particular topic post exposure. Do they stick with the opinions of the first article or do they venture out to challenge it? Given that we are exposed to topics of interest everyday, I believe, a long term exposure study is needed to track the echo chamber effect which is missing in paper [1].

Paper [2] gives reasons for the need to develop products which promote diversity and exposes users to many opinions fostering deliberative discussion. The paper, then goes on to discuss few examples of such products.

Can the user groups be nudged towards good behaviour? I believe that is definitely possible. But, how can one achieve that in a equitable manner? Could it be that certain users are more vulnerable to nudging and others aren’t? Would the data obtained to do so by private entities like social media companies be put to use in the right manner?

I believe some oversight on the above by some third-party is necessary to ensure the same.

Read More

Reflection #7 – [09/18] – [Nitin Nair]

[1]        T. Erickson and W. A. Kellogg, “Social translucence: an approach to designing systems that support social processes,” ACM Trans. Comput. Interact., vol. 7, no. 1, pp. 59–83, 2000.

[2]        J. Donath and F. Viégas, “The chat circles series,” Proc. Conf. Des. Interact. Syst. Process. Pract. methods, Tech. – DIS ’02, p. 359, 2002.

Human beings are social animals. The importance of communication in us being a social animal is vital. We use speech as a mode through which information about our world and ourselves, construct agreed together myths and legends to create shared realities and even use it to warn one another. But, in the recent years, this physical phenomenon is increasingly being substituted by the virtual equivalent. These virtual tools, unlike the physical equivalent are crude in nature. How can one create a tool that is not crude and is as functional as physical speech? This is the question [1] is trying to tackle.

[1] firstly defines the term social translucency. It identifies that visibility, awareness and accountability are the building blocks of social interaction. It also identifies how naturally constraints come into picture and the importance of shared understanding of these constraints. The author then goes on to describe various systems that facilitate these functionality.

The privacy concerns associated with a system like babble is warranted. But could you bring in the notion of privacy in such systems and implement them? One could enable functionality through which one can peruse through another’s post history or go into circles anonymously. But how do you prevent misuse of these features. One could create social pressure by notifying users on who viewed their profile like in the case of Linkedin or have viewed their post history.

Given the need for “windows” and not “walls” in digital ecosystems, would require the transportation of data that constitutes as social information along with the actual information. Although such a system is important would such a bandwidth heavy requirement create barriers to users who do not have access to them? Although, internet access and speed is improving, the transplant to a fully digital world will only take place when the universal access to broadband becomes reality.

Mimetic platforms could be reality through the use of AR/VR technology.  Given how the average compute power especially mobile devices are increasing rapidly, barriers are being removed to enter into the market with such “mimetic” platform. How you integrate such functionalities, giving users an actual benefit in being in your platform would determine the success of such a platforms.

It’s an interesting how rare “abstract systems” are in the wild? What could be the reason behind it? One of the reason could be the upfront cost associated with learning the mechanics of the system.

Paper [2] is concerned with the process of designing a system which is legible and engaging. It progresses from a barebones system to more feature laden system giving the reason for each functional upgrade. The “socially translucent” systems [2] builds are Chat Circles, Chat Circles II, Talking in Circles, and Tele-Directions.

Given how systems like chat circles work, how can you accommodate for people having multiple accounts or online personas?

The chat circles could also show users, groups having discussions on similar topics extending the functionality of the “hearing range”. These groups may be located far away geographically and the topics may be found in real time using state of the art NLP systems.

Given the information from [1] and [2] trying to force online communication channels to mimic how physical communication works, one could also argue if such a push is needed? I believe a hybrid of the current system with a more social translucent feature is what is necessary. Being able use the “legacy” mode would be one way to move forward. Online communication should be given the space to let it evolve naturally like how physical conversations have.

Given, how many people we interact with online vs how many you actually have a conversation with offline, it is necessary different people be put in different “circles” accommodating for pressure to put people you wouldn’t normally want in your innermost one. Such a tiered approach, although not new, could help address the elephant in the room i.e. the privacy issue associated with online social communication platforms.

Read More

Reflection #6 – [09/13] – [Nitin Nair]

  1. Sandvig et. al. “Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms”
  2. Hannak et. al. “Measuring Personalization of Web Search”

Our lives have become so intertwined with systems like web search that we fail to understand the potential foibles of such systems. The trust we have towards such systems are not questioned until a major flaw with it is uncovered. But, given the black-box nature of these systems how does one even understand the shortcomings of such systems. One of the ways this could be achieved is through a process called “Algorithmic Audit.” The importance of these audits are ever more important given the way of the world at the moment.
In paper [1], the author first talks about the concept of “screen science” coined by American Airlines team who created the SABRE flight reservation system to refer to human biases in the way we choose things given in a list. He then goes on to conclude righly, how important “algorithmic audits” are , while introducing the said concept. He argues that understanding of systems that we interact with daily is paramount given how firms have acted in the past. The author points out various types of audits, namely, code audit, non-invasive user audit, scraping audit, sock puppet audit, and collaborative crowdsourced audit. The author then finally examines what would be needed in order for us to conduct such audits in terms of legal or financial support while advocating a shift of perspective to “accountability by auditing.”
In this paper [2], the author tries to measure the basis and the extent of personalization in modern search engines through the use of benign sock-puppet auditing. The major contributions the author makes are as follows: a methodology is created to measure personalization in web searches, this methodology is then used to measure personalization in google and the causes behind personalization is investigated. Couple of issues the author’s methodology has mitigated are issues related to temporal changes in search index, issue of consistency with respect to search indices being distributed and mitigate the use of A/B testing by the search provider.
One of the first thoughts that popped up in my mind is how the results of the audit might be different if conducted at the moment. The push towards more personalization through the use of machine learning models and the push in doing so by all major internet firms and in this case, Google, might render different results. This which would an interesting project to explore.
Are certain language constructs due to it being favored by certain groups found crudely by optimizing on the click rate being used in search results? This is an interesting question one could try to answer extending the insights that [2] gives. It would also be interesting to compare the description of the person constructed by the system to that of the actual person, see how optimizing on different factors have an impact on the system’s ability to create a true to life description.
Although the author shows that carry-over effect becomes negligible after 10 minutes the long term effects of the profiling due to being able to understand the user, through their understanding their behaviour and preferences thoroughly are not explored in the work. The challenge in identifying this would be the same issues the author pointed out, changing search indices and not being able switch off personalization during searches.
Given how important understanding of these systems are and their impact on our understanding of the world, it would be a worthwhile action to perform to conduct such audits on algorithms, data used to create ML models by unbiased agencies to track the reliability and biases of such systems while giving enough room to keep the privacy of such algorithms built by these providers safe. Having checks on these systems will ultimately ground the expectations of us users on these systems. If any malevolent actions are found, legal actions could be called for against these service providers to foster accountability.

Read More

Reflection #4 – [09/06] – [Nitin Nair]

  1. S. Kumar, J. Cheng, J. Leskovec, and V. S. Subrahmanian, “An Army of Me: Sockpuppets in Online Discussion Communities,” 2017.

Discourse has been one of the mode through which we humans have used our collective knowledge to further our individual understanding of the world. These conversations have helped one look at the world through the lens of another, the outlook which we may or may not agree to. This discourse have moved in the past few decades to the online world. This movement although opening up pulpits to the masses giving them opportunities to express their opinions, have created few issues of their own. One of the issues that plague these discussion forums due to how identity function in these online settings is sock puppeteering. Although, variations of the same should have existed in the past, due to the scale and reach of these online discussion forums, the dangers are more profound.

The author in this paper, tries to understand the behaviour and characteristics of these “sockpuppets” and use the findings for performing two tasks. First, to differentiate pairs of sockpuppets from a pair of ordinary users and second, to predict if an individual is a sockpuppet or not. The author uses the data obtained from nine communities from the online commentary platform, Disqus as its dataset. The author classifies one to be a sock puppeteer using couple of factors like IP addresses, length of posts and time between posts.

As indicated by the author, writing style seems to persist due to the content being written by the same author. In order to use this as a feature the author uses LIWC and ARI which I believe, even though shown as effective here, could have been better if replaced by better quality vectors that not only looks at the vocabulary but takes into account the semantic structure elements like construction of sentences etc to identify the “master.” Building features vectors in this fashion, I believe would help in one identifying these actors in a robust manner.

Once, the master is identified it would be interesting to analyze the characteristics of the puppet accounts. Given, that some accounts might elicit a more responses compared to some others, it would be a worthwhile study to see how it achieves to do so. One could see if there is any temporal aspect to it; identify when the best time is, to probe, for one to get a response and how these actors optimize their response strategies to achieve it.

One could also look into behaviors by these sockpuppetiers that warrant ban from moderators of these online communities. Identifying these features could then be recorded and given as guidelines for identifying the same. How long these observations may be valid would be something different altogether.

Given that some communities with human moderators have been addressing this particular issue using “mod bans”, one could try to create supervised models for identification of sock puppeteering accounts using these ban informations as the ground truth or label.

Also, on a different note, it would be a worthwhile pursuit to see how these actors respond such bans. Do they create more accounts immediately or do they wait it out? An interesting thought that can be looked into for sure.

Given that uncovering or profiling the identity of the users is the way forward to counteract sock puppeteering, it is a valid concern for for users whose identity needs to be kept under wraps for legitimate reasons. Given that even allegations of certain news about a particular person has lead to violence being directed at them, how can one ensure these people are protected? This is one issue the method described the author which uses IP addresses to identify sock puppetry need to address. How can one differentiate users with legit reasons for creating multiple identities online from those who don’t?

Read More

Reflection #3 – [9/4] – [Nitin Nair]

  1. Mitra, G. P. Wright, and E. Gilbert, “A Parsimonious Language Model of Social Media Credibility Across Disparate Events”

Written language for centuries, coming from word of mouth, has been the primary mode for discourse and the transportation of ideas. Due to its structure and capacity it has shaped our view of the world.But, due to changing social landscapes, written language’s efficacy is being tested. The emergence of social media, preference of short blobs of text, citizen journalism, the emergence of cable reality show, I mean, the NEWS and various other related occurrences are driving a change in the way we are informed of our surroundings. These are not only affecting our ability to quantify credibility but is also inundating us with more information one can wade through. In this paper, the author explores the idea of whether language from one of the social media website, twitter, can be a good indicator of the perceived credibility of the text written.

The author tries to predict the credibility of the credibility of news by creating a parsimonious model(low number of input parameter count) using penalized ordinal regression with scores “low”, “medium”, “high” and “perfect.” The author uses CREDBANK corpus along with other linguistic repositories and tools to build the model. The author picks modality, subjectivity, hedges, evidentiality, negation, exclusion and conjugation, anxiety, positive and negative emotions, boosters and capitalization, quotations, questions and hashtags as its linguistic features while using number of number of original tweets, retweets and replies, average length of original tweets, retweets and replies and number of words in original tweets, retweets and replies as the control variables. Measures were also taken like the use of penalized version of ordered logistic regression to handle multicollinearity and sparsity issues. The author then goes on to rank and compare the different input variables listed above by its explanatory powers.

One of the things I was unsure of after reading the paper is if the author accounted for long tweets where the author uses replies as a mean to extend one’s tweet. Eliminating this could make the use of number of replies as a feature more credible. One could also see that, the author has missed to accommodate for spelling mistakes and so forth, as this preprocessing step could improve the performance and reliability of the model.
It would be an interesting idea to test if the method the author describes can be translated to other languages especially languages which are linguistically different.
Language has been evolving ever since its inception. New slangs and dialects adds to this evolution. Certain social struggles and changes also have an impact on language use and vice versa. Given such a setting, is understanding credibility from language use a reliable method? This would be an interesting project to take on to see if these underlying lingual features have remained same across time. One could pick out texts involving discourse from the past and see how the reliability of the model build by the author changes if it does. But this method will need to account for the data imbalance.
When a certain behaviour is penalized, the repressed always find a way back. This can also be applicable to the purveyors of fake news. They could game the system in using certain language constructs and words to evade the system. Due to the way the system is build by the author, it could be susceptible to such acts. In order to avoid such methods one could automate this feature selection. The model could routinely recalculate the importance of certain features while also adding new words into its dictionary.
Can a deep learning mode be built to better the performance of credibility measurement? One could also try building a sequential model may it be LSTMs or even better a TCN [2] to which vectors of words in a tweet generated using word2vec could be given as input along with some attention mechanism or even [4] to allow us to have an interpretable model. Care has to given that models especially in this area have to be interpretable model so as to avoid not having an accountability in the system.

[2] Colin Lea et al, “Temporal Convolutional Networks for Action Segmentation and Detection”
[3] T. Mikolov et al, “Distributed Representations of Words and Phrases and their Compositionality”
[4] Tian Guo et al, “An interpretable {LSTM} neural network for autoregressive exogenous model”

Read More

Reflection #2 – 08/30 – [Nitin Nair]

  1. Justin Cheng, Cristian Danescu-Niculescu-Mizil, Jure Leskovec. “Antisocial Behavior in Online Discussion Communities.”

In this day and age when “information overload” is widespread, the commodity everyone is eager to capture is attention. Users having the ability to do the same are sought after by companies trying to tout their next revolutionary product. But, there is one group of users which has particular ability to capture attention but the way they achieve it, make them, thankfully, undesirable to these establishments. These users through their vile and provocative mechanisms can send even the most civil of the netizens off their rails. But who and how these rogue actors function, can their behaviour be profiled at scale and then be used for nipping such bad actors from forums early from the bud? These are the questions [1] is trying to answer.

To start with, the paper distinguishes users from three websites namely CNN, IGN and Breitbart over a period of 18 months into two categories, Future Banned Users (FBUs) and Never Banned Users (NBUs). The FBUs are observed to have two subgroups, ones who concentrate their efforts to few threads or groups and ones who distribute their efforts to multiple forums. The author then measures the readability of the posts by these different categories of users to observe that the FBUs tend to have higher automated readability indices (ARIs) and displayed more negative emotions than NBUs. The author also measures the trend of user’s behaviour overtime to note any shift in their category label. The author later uses four different feature set namely post, activity, community and moderator to build a model to predict if a user will be banned or not.

To start with, the dataset is annotated by 131 workers in AMT. But due to the nature of selection of the workers nothing is known about the race, educational background or even political alignments which can definitely change the definition of “anti-social.” The diversity of opinion on what constitutes as “anti-social” is extremely important which the author hasn’t given much credence to.

Given the use of metric of using user deletions, effectiveness of such a model in forums where such user feedback mechanism is not present or in forums while such behaviour is norm and rampant, I believe, would be extremely low. What could be the metrics that could be adopted in forums like the ones mentioned? This could be an interesting avenue to explore.

Also, could these anti-social elements have a coordinated attack in-order take control over the platform? The individuals can bench members with more reports and and use its members who have less of these reports. The individuals can even create new accounts helping them steer a conversation to their cause. These are interesting strategies these individuals could adopt, the methods described in the paper would fail to detect. Can profiling these users’ content in order to ascertain their true identities create a slightly more robust model? This is something one can definitely try to work on in the future.

Another, interesting work that could be done is to identify the different methods through which these trolls try to elicit inflammatory behavior from their target. Also one could try to see how these mechanisms evolve, if they do, over time, as old ones tend to lose their ability to elicit such behaviour.

Can identifying users’ susceptibility in different forums or networks, be used to take preventive steps against anti-social behaviour? If one were to do that what are the different features that could be used to predict such susceptibility? Couple of features without much could be the number of replies the user gives to these trolls, the timespan the user has been active in the network and length of replies along with the sentiment. This if done could also be used to identifies trolls who have more sway over people.

Although, the intent and the motivation of the paper was excellent the content the paper left much to be desired.

Read More