Felbo, Bjarke, et al. “Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm.” arXiv preprint arXiv:1708.00524 (2017).
Nguyen, Thin, et al. “Using linguistic and topic analysis to classify sub-groups of online depression communities.” Multimedia tools and applications 76.8 (2017): 10653-10676.
Reflection:
Felbo et. al, used a raw dataset of 56.6 billion tweets, filtered to 1.2 billion relevant tweets to build a classifier to classify the emotional content of texts accurately. In my opinion, this is one of the best thought and well-executed paper we have read so far. I was impressed by the size of the dataset used by the authors, which obviously helped in building a better predicting classifier. Although this paper spoke mostly about ML techniques, most of which I was unfamiliar with, I found their ‘chain-thaw’ transfer learning technique quite intriguing. It was also quite fascinating to read how this approach helped in avoiding possible overfitting. The authors have also built a website ‘Deepmoji’ to demonstrate their model and are available for use to anyone. The website provided a good understanding of which words were given more weight while converting the text to its equivalent emotion. There are certain users who only use emojis to write their messages. Can this study be extended to actually interpret the context behind such messages?
Paper 2 by Nyugen et. al, talks about exploring the textual cues of online communities interested in depression. For the study, the authors randomly selected 5000 posts from 24 online communities and identified five subgroups of online communities: Depression, Bipolar Disorder, Self-Harm, Grief, and Suicide. To identify these communities’ psycholinguistic features and content topics were extracted and analyzed. This paper also implemented ML techniques to build a classifier for depression vs other subgroups. There are certain aspects which I didn’t like about this paper like, the authors used a small database and from an online forum. How did they handle the possible bias and how did they validate the authenticity of the posts? Do depressed people actually go online and discuss or look for solutions regarding their issues? Also, what remains unclear is the reason behind comparing depression with other subgroups. Aren’t those subgroups a part of depression? I feel a disconnect in terms of how the authors started by stating a problem and then diverging away from the same.
Apart from these points there are certain aspects which I liked about this paper like, Nyugen et. al, implemented and compared results from various classifiers and one future work which I can think of is this method/concept being used by psychiatrist to actually detect the type and severity of depression a person is suffering by analysing their posts or writing behaviour .