Reflection #12 – 04/05 – Pratik Anand

1) Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm

The paper aims to use the emojis to detect diverse set of emotions in text. The authors acknowledge the previous work in this field with the use of postive and negative emojis for finding the emotion of the text. They extend the previous work by having a more diverse set of classifications. A good example is emotional distinction between “this is shit” and “this is the shit”. The former has a negative connotation whereas the latter has a strong positive connotation. Their model, DeepEmoji, is a variation of LSTM model. The authors used Twitter dataset with emojis for training their model. For tweets with multiple emojis, the authors use the same tweet separately for each emoji in their dataset. Sometimes, a group of emojis collectively convey a message. Won’t such messaging be lost if the emojis are analyzed separately rather than a group? It indeed has a problem of distinction of such cases from cases where different emojis have no relation among each other. There is an attempt later in the paper to cluster emojis together into different groups. I believe that work can be extended to address the point mentioned above. Can a similar approach be applied to memes ?

Overall, it is a unique technical paper which even has a live demo available at deepmoji.mit.edu which one can play around. It is good to see work in emojis as they are the future of human communication.

2) Using linguistic and topic analysis to classify sub-groups of online depression communities

The paper uses linguistic features to identify sub-communities in the online depression communities. The paper, despite being fairly recent, uses Live Journal dataset for its studies. They could have gotten much better results with more popular communities like reddit, facebook groups datasets. Live Journal as a blogging website is on decline since 2010s. Nevertheless, if the dataset is representative, it is good enough. The authors identified various communities for depressed people which they grouped into the categories like Depression, Self-Harm, Grief/Bereavement, Bipolar Disorder and Suicide. What is the reasoning behind such cateogorisation ?
The authors use linguistic features as well as topic modelling to extract feature sets. The word clouds from topics provided some intersting keywords. An interesting fact is that no unique identifiers were found for Depression community except for filler words. What could be the reason for such behavior ? Can it be changed by different classification methodology? What should be the ground truth in cases related to mental health?
The authors posit that the latent features are representative of the sub groups and can be used for identification of such sub-groups. Another point is that it could be used as a starting point and correlated with data from other social networks to create a mental health profile of an individual. Different linguistic profile can help in understanding such communities better.

Leave a Reply

Your email address will not be published. Required fields are marked *