Reflection #11 – [04/10] – [Vartan Kesiz-Abnousi]

April 10, 2018 vartan Leave a comment

[1] Pryzant, Reid, Young-joo Chung and Dan Jurafsky. “Predicting Sales from the Language of Product Descriptions.” (2017).

[2] Hu, N., Liu, L. & Zhang, J.J. “Do online reviews affect product sales? The role of reviewer characteristics and temporal effects”. Inf Technol Manage (2008) 9: 201. https://doi.org/10.1007/s10799-008-0041-2

Summary [1]

As the title suggest, the authors’ main goal is to predict sales by examining which linguistic features are more effective. The hypothesis is that product descriptions have a significant impact on consumer behavior. To test this hypothesis, they mine 93,591 product descriptions and sales records from a Japanese e-commerce website. Subsequently, they build models that can explain how the textual content of product descriptions impacts sales. In the next step they use these models to conduct an explanatory analysis, identifying what linguistic aspects of product descriptions are the most important determinants of success. An important aspect of this paper is that they are trying to identify the linguistic features by controlling for the effects of pricing strategies, brand loyalty and product identity. To this end, they propose a new feature selection algorithm, the RNN+GF. Their results suggest that lexicons produced by the neural model are both less correlated with confounding factors and the most powerful predictors of sales.

Reflections [1]

This has been a very interesting paper. Trying to control for confounding factors in an RNN setting is something that I haven’t seen before. The subsequent analysis by utilizing a mixed model was also a very good choice. Technically speaking, this paper is not that easy to follow because it requires a wide range of knowledge from different disciplines: deep learning, excellent knowledge of statistical modeling (i.e. mixed models and all the tests and assumptions) and knowledge from consumer theory.

It should be stressed that when textual features are regarded as fixed effects, this implies that they are invariant for the log of sales, which is the dependent variable. This makes sense for the authors because they believe that by adding the brand and the product as random effects they control for every other effect. Perhaps they could have controlled for seasonal effects, albeit since the data is a snapshot for only a month, it is understandable why the authors didn’t do it.

I am not certain how the authors ensure that the random effects, brand and product, are not correlated with the rest of variables.

Questions [1]

Comparison of the results to a baseline RNN model? I wonder how would that work.
Chocolate and Health products have different transaction and information costs compared to the clothing industry. Would the linguistic results for E-commerce hold for other industries?

Summary [2]

In this study, we investigate how consumers utilize online reviews to reduce the uncertainties associated with online purchases. The authors adopt a portfolio approach to our investigation of whether customers of Amazon.com understand the difference between favorable news and unfavorable news and respond accordingly. The portfolio comprises products and events (favorable and unfavorable) that share similar characteristics. The authors find that changes in online reviews are associated with changes in sales. They also find that, besides the quantitative measurement of online reviews consumers pay attention to other qualitative aspects of online reviews such as reviewer quality and reviewer exposure. In addition, they find that the review signal moderates the impact of reviewer exposure and product coverage on product sales

Reflections [2]

Another interesting paper. The authors use the concept of “transaction costs”. In addition to the transaction costs that are mentioned in the theory, there is another one that is known as “search cost”, although they touch the subject indirectly when they introduce the uncertainty theory. The amount of time consumers will dedicate to search more information about a product is bounded, and it is related to their “opportunity cost”. In general, this cost is lower in E-commerce than in other markets (for instance, going from one grocery store to another induces large search costs). In general, an E-commerce market is considered to be closer to the “ideal” perfect market, than the traditional markets, because all these transaction costs that distort the market and are mentioned in the paper are lower. Overall, as far as I am aware, Amazon.com tags user reviews with a “verified purchase”. I wonder if the authors took this into account. In addition, I have the same criticism as in the previous paper. In this case, the authors used book, DVDs and videos. These are entertainment products with very similar transaction costs attached to them. It is unclear whether these results can hold for other products.

Question [2]

Extension: See if the results hold for products that belong into a different category other than entertainment.

Reflection #13 – [04/10] – [Ashish Baghudana]

April 10, 2018April 10, 2018 ashish Leave a comment

Pryzant, Reid, Young-joo Chung and Dan Jurafsky. “Predicting Sales from the Language of Product Descriptions.” (2017).

Summary

Pryzant’s paper focuses on predicting sales of products from product descriptions. The problem is not straightforward because sales are often impacted by brand loyalty and pricing strategies, and not merely a function of the language of product descriptions. The authors solve this problem by using a neural network with an adversarial objective – a good predictor of sales, but a bad predictor of brand and price. They use the adversarial neural network to mine textual features to feed into their mixed effects model where the textual features form the fixed effects, and the product and the brand form random effects.

The authors use the Japanese e-commerce website Rakuten as their data source and choose two categories – chocolate and health. The motivation to choose these two is that

high variability in the chocolate products
pharmaceutical goods that are often sold wholesale
the two categories are at two ends of a spectrum

Performance metrics showed that R2 values with textual features and random effects consistently performed better than without the textual features. On further analysis, the authors find that influential keywords had the following properties – informativeness, authority, seasonality relevance, and politeness.

Critique

I enjoyed reading the paper for multiple reasons. Firstly, the focus of the paper, despite being neural network-based, was not on the model itself, but on the final goal of finding influential keywords and why they were influential. Secondly, the use of the neural network in this model was not just to do a bunch of predictions (though I’m sure it would be good at that), it was to do feature selection. As some of the other critiques already mentioned, this is counter-intuitive but seems to work really well. Thirdly, adversarial objectives haven’t worked very well for text previously, but the authors were able to find good use of the technique in textual feature selection.

A few points that I had comments about:

The authors chose chocolates and health products as the two categories. Health products (in my opinion) do not have any brand loyalty associated with them, especially if you have only one choice, or if the drug is generic. In such a category, why does one need to control for the brand?
On a related note, could the authors have done this analysis on a product category like shoes or electronics?
Thirdly, the results that arose out of the influential keyword analysis reflect the Japanese culture. The keywords are likely to differ in a country like the US (where product descriptions might play a significantly more important role) or India (where the price would play a very important role).
Finally, images are becoming key to selling products in the online space. I have no study to prove this, but increasingly users have decreased attention spans and focus on the images and the reviews to decide if they want a product. It would be interesting to incorporate those features as well in future work.

Reflection #13 – 04/10 – Pratik Anand

April 10, 2018April 10, 2018 pratik Leave a comment

Paper : Do online reviews affect product sales? The role of reviewer characteristics and temporal effects

The paper provides an interesting take on impact of online reviews on product sales.
The authors posit that apart from general review rating of a product, additional information like reviewer’s standing, review quality as well as product coverage play a big role in sale of a product. They use the seminal work on TCE and Uncertainty Reduction Theory to support their hypothesis. The latter suggests that if a user doesn’t have enough information about a product, they will try to reduce the uncertainty by looking at product reviews, popularity, reputation of reviewers etc. What all factors play a role in reducing uncertainty ? Social features can also play a big role, especially in e-commerce sites.
If a friend has recommended a product, it will outweigh the impact of other factors like review score and quality.
Why didn’t the authors take review score + quality + reviewer standing as part of a hierarchical process to reduce uncertainty ? A typical user may look for higher review scores for reducing the sameple space. Then, she may use the quality, reviewer standing and other factors and repeat this process till she is satisfied with the final results.
The paper doesn’t take into acount the language of the review text for determining its quality. How it is determining quality otherwise ? Amazon has a review helpfulness score. Are they using those ?
Even though it is intuitive that product coverage play a big role, rise of discovery services in online e-commerce as well as food review portals show that users might not always go for the most featured products. Many people always want to seek out new products and thus, coverage may have some negative impact in such cases too.
Lastly, the authors have mentioned that the impact of negative reviews on product sales reduces with time ? What could be the reasoning behind it ? Is it bound to the type of recommendation algorithm used by e-commerce portal? If such algorithm takes into account all the reviews of an item since the beginning, will such argument hold?

It is a good paper which establishes various hypothesis about online product and tests them against a given dataset. A future direction of research may be to try it in different markets and take language of review as a factor of quality

Reflection #13 – [04/10] – [John Wenskovitch]

April 10, 2018 John Wenskovitch Leave a comment

This pair of papers evaluates the prediction of product sales based on both linguistic and quantitative aspects of online content. In the Pryzant et al. paper, the authors looked specifically at product descriptions. They obtained more than 93,000 health and chocolate project descriptions for the website Rakuten in order to evaluate these product descriptions linguistically, expanding on previous studies that examined summary stats. Using a neural network that controls for confounding features (pricing, brand loyalty, etc.), they identify a set of words and writing styles that have high impact on sales outcomes. In contrast, the Hu et al. paper examines the influence of online reviews. Rather than performing a linguistic analysis, they instead examine features such as quality of reviewers and age of an item (number of reviews).

I did really enjoy reading through the Pryzant paper. The thorough explanation of the neural network mathematics really helped to make it clear what the authors were doing, and the experiments section was clear and well-written. I think my biggest criticism of the paper is that, if you strip away all of this explanation, it doesn’t feel like the authors did all that much. They extend a neural network to meet their feature selection goals, tokenized two different datasets, ran the model, and reported a few results. This area of research is certainly not my area of expertise, but this feels like a single research question workshop paper or class project. The class project my group is building has 3 (arguably 4) distinct research goals.

Beyond that, the authors don’t spend much time discussing the lack of general cultural applicability of their findings. They note the extensibility of the project to a general lexicon near the very end of the conclusion, and that’s about it. There is no indication of how these results are applicable to any language/culture outside of Japanese/Japan. Additionally, their “seasonality” result seems to me to be too close to some of the confounding variables that the authors wanted to eliminate. Is there really that big of a difference between marketing a product with “free shipping!” in the description and marketing a seasonal item with “great Christmas gift!” in the same place?

Two stylistic criticisms for the Hu paper: (1) I think it could have been better organized by grouping the hypothesis and result of each research question together, rather than having separate hypothesis and result sections (and I feel the same way about our class project final report). I frequently found myself paging back and forward between results, hypotheses, and background that led to those hypotheses. (2) I was intrigued by the tabular related work approach. I can see it being useful in well-developed fields and for survey papers. However, in more recent and novel research, this approach makes it difficult to understand the novelty of the work performed by the authors. It’s more of a list of past results rather than an explanation of the authors’ contributions.

Reflection #12 – [04/05] – [Meghendra Singh]

April 5, 2018 meghs Leave a comment

Felbo, Bjarke, et al. “Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm.” arXiv preprint arXiv:1708.00524(2017).
Nguyen, Thin, et al. “Using linguistic and topic analysis to classify sub-groups of online depression communities.” Multimedia tools and applications 76.8 (2017): 10653-10676.

The first paper presents DeepMoji, an emoji prediction deep neural network, trained using 1.2 billion tweets. I found the paper and the DeepMoji demo available at https://deepmoji.mit.edu/ very compelling and fascinating. The key contribution of this work was to show how emoji occurrences in tweets can be used to learn richer emotional representation of text. To this end the authors construct a deep neural network model (DeepMoji) using an embedding layer, two bidirectional LSTM layers, Bahdanau attention mechanism and a Softmax classifier at the end. The authors detail the pretraining process, three transfer learning approaches (full, last and chain-thaw) and evaluate the models obtained on 3 NLP tasks on 8 benchmark datasets across 5 domains. Results of the evaluation suggest that the model trained using the chain-thaw transfer learning procedure beats the state of the art on all the benchmark datasets.

I am not really sure how upsampling works and the authors do not discuss the upsampling technique used in the paper. Also, it would have been interesting to know if the authors experimented with different architectures of the network before finalizing on the one presented here. How did they arrive at this particular architecture? Additionally, will increasing the number of BiLSTM layers improve performance on the benchmark datasets and will this change in architecture be comparable with the chain-thaw transfer learning technique are questions that can be explored. Moreover, since tweet length is limited to 280 characters, it is not possible to analyze longer texts with high confidence using this technique, unless the study is repeated on a dataset with longer texts mapped to specific emojis. It might be difficult to replicate this study for languages other than English and Mandarin. This is because large twitter/weibo-like data sources that contain distant supervision labels in the form of emojis, may not exist for other languages. Therefor, it will be interesting to see what other distant supervision techniques can be used to predict emotional labels for texts on social media in other languages.

In table 2, we see that most of the emojis on Twitter are positive (laughter, sarcasm, love) and the negative emojis (sad face, crying face, heartbreak), I wonder if the same trend would be observed on other social media websites. Nevertheless, given the proliferation of emojis in computer mediated communication, it would be interesting to repeat this study with data like: facebook posts, comments, posts and comments on any social website. Additionally, as one can use this approach to effectively determine various emotions that are associated with any text at a very granular level, this approach can be used to filter content/news for a user. For example, if a user only wants to read content that is optimistic and cheerful, this approach can filter out all the content that does not fall in that bucket. One can also think of using this approach to detect the psychological state of an author. It might be interesting to see if the emotional content of an author’s posts remains consistently pessimistic does that predict clinical conditions like: depression, anxiety or self-harm events?

This brings us to the second paper which analyzes 38K posts in 24 Live Journal communities to discover the psycholinguistic features and content topics present in online communities discussing about Depression, Bipolar Disorder, Self-Harm, Grief/Bereavement and Suicide. The study generated 68 psycholinguistic features using LIWC and 50 topics using LDA for text from 5K posts (title and content text). The authors subsequently use these topics and LIWC features as predictors with LASSO regression, for the 5 subgroups of communities interested in the 5 disorders/conditions (Depression, Bipolar Disorder, Self-Harm, Grief/Bereavement and Suicide). The authors find that latent topics had greater predictive power than linguistic features for bipolar disorder, grief/bereavement communities, and self-harm subgroups. The most interesting fact for me was that help-seeking was not a topic in any of the subgroups and only the Bipolar Disorder subgroup discussed treatments. This seems very strange for communities dedicated to discussion for psychological illness.

It would be interesting to repeat this experiment and see if the results remain consistent. I say this because the authors do a random sampling of 5K posts and this may have missed certain topics, LIWC categories. It would also be interesting to know the statistics about lengths of these posts and whether this was taken into consideration when sampling the posts? Another aspect to point out is that the Bipolar Disorder subgroup had a larger number of communities (7 out of 24) did this somehow effect the diversity of topics extracted? Perhaps it might be a good idea to use all the posts from the 24 communities? We also see that Lasso outperformed the other three classifiers and it would be interesting to see if ensemble classifiers would outperform Lasso? Overall the second paper was an excellent read and presented some very interesting results.

Reflection #12 – [04/05] – [Hamza Manzoor]

April 5, 2018April 5, 2018 hamzamanzoor Leave a comment

[1] Nguyen, Thin, et al. “Using linguistic and topic analysis to classify sub-groups of online depression communities.” Multimedia tools and applications 76.8 (2017): 10653-10676.

In this paper, Nguyen et al. use linguistic features to classify sub-groups of online depression communities. The dataset that they use is of Live Journal and is comprised of 24 communities and 38,401 posts. These communities were grouped into 5 subgroups: bipolar, depression, self-harm, grief, and suicide. The authors built 4 different classifiers and Lasso performed the best. First of all, the size of the dataset is very small and secondly, I don’t mind the use of Live Journal but most of the papers that were similar to this topic performed their studies on multiple platforms because it is possible that they got their results due to the certain specific characteristics of Live Journal platform.

I am pretty sure that this paper was a class project given the size of data the way they performed modeling. First, the authors labeled the data themselves that can induce some bias and secondly, the major put off was that they used 4 different classifiers instead of using multi-class classifier. I wish they had a continuous variable 😉

Finally, my biggest criticism is that why the 5 subgroups were created because self-harm, grief, suicide etc. are a result/cause of depression and therefore, the claim in the paper that “no unique identifiers were found for the depression community” verifies my argument. The subgroups, which are the basis of entire paper do not make sense to me at all.

[2] Felbo, Bjarke, et al. “Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion, and sarcasm.” arXiv preprint arXiv:1708.00524 (2017).

In this paper, Felbo et al. build an emoji predictor named DeepMoji. The authors build a supervised learning model on an extremely large dataset of 1.2 billion tweets to classify the emotional sentiment conveyed in tweets with embedded emoji. The experiments showed that DeepMoji outperformed state of the art techniques for the specific dataset.

My first reaction after looking at the size of the dataset was “Woah!! They must have supercomputers”. I wonder how long did it take for their model to train. The thing that I liked the most about this paper was that they provided a link to a working demo (https://deepmoji.mit.edu). Given that I had a hard time understanding the paper, I spent a lot of time playing around with their website and I can’t help but appreciate how accurate their model is. Here is one example that I tried for a phrase “I just burned you hahahahaha”, and it showed me the most accurate associated emojis.

Now, when I removed the word ‘burned’, it showed me the following

Due to my limited knowledge of deep-learning, I cannot say if there were flaws in their modeling or not, but it seemed pretty robust and the results of their tool show that robustness.

Anyway, I believe that this paper was very Machine Learning centric and had less to do with psychology.

Finally, the authors say in the video (https://youtu.be/u_JwYxtjzUs) that this can be used to detect racism and bullying. I would like to know how the emojis can help in that?

Reflection #12 – 04/05 – [Jiameng Pu]

April 5, 2018 jiameng Leave a comment

Using linguistic and topic analysis to classify sub-groups of online depression communities

The world is in the period of high incidence of different kinds of mental problem. The complex nature of depression, e.g., different presentation of depression among depressed people, makes it difficult for the treatment and prevention. Therefore, the paper focuses on online communities to explore depression based on more and more exchange of information, support, and advice online. Machine learning can help identify topics or issues relevant to those with depression and characterize the linguistic styles of depressed individuals. The paper utilizes machine learning techniques on linguistic features to identify sub-communities in the online depression communities.

Reflection

They mentioned “In this work we have used Live Journal data as a single source of online networks. In fact, Live Journal users could also join other networking services, such as Facebook or Twitter”, which exactly corresponds to my concern — the dataset could be better and more up-to-date, like data from popular social media, e.g., reddit, twitter and facebook. In fact, communities like facebook has conducted such research to detect accounts that might be in depression to carry out necessary psychological treatment. However, Live Journal data is obviously old for this task. Another point I feel confused is the five subgroups of online communities were identified: Depression, Bipolar Disorder, Self-Harm, Grief/Bereavement, and Suicide.. I didn’t see the inner logic how these five subgroups can fully represent downhearted psycholinguistic features, for instance, what’s the difference between self-harm and suicide… What this paper impresses me is the comparison of different classifiers, from SVM to Lasso. In practice, I sometimes feel like researchers have to try different machine learning models, and it’s not that we can always correctly guess which would work better, even when you got some priori experience. I haven’t used LIWC so far, but it’s apparently one of the most widely used tools in all the paper we’ve read. Look forward to trying it out in my future research…

Reflection #12 – 04/05 – [Aparna Gupta]

April 5, 2018April 5, 2018 agupta12 Leave a comment

Felbo, Bjarke, et al. “Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm.” arXiv preprint arXiv:1708.00524 (2017).

Nguyen, Thin, et al. “Using linguistic and topic analysis to classify sub-groups of online depression communities.” Multimedia tools and applications 76.8 (2017): 10653-10676.

Reflection:

Felbo et. al, used a raw dataset of 56.6 billion tweets, filtered to 1.2 billion relevant tweets to build a classifier to classify the emotional content of texts accurately. In my opinion, this is one of the best thought and well-executed paper we have read so far. I was impressed by the size of the dataset used by the authors, which obviously helped in building a better predicting classifier. Although this paper spoke mostly about ML techniques, most of which I was unfamiliar with, I found their ‘chain-thaw’ transfer learning technique quite intriguing. It was also quite fascinating to read how this approach helped in avoiding possible overfitting. The authors have also built a website ‘Deepmoji’ to demonstrate their model and are available for use to anyone. The website provided a good understanding of which words were given more weight while converting the text to its equivalent emotion. There are certain users who only use emojis to write their messages. Can this study be extended to actually interpret the context behind such messages?

Paper 2 by Nyugen et. al, talks about exploring the textual cues of online communities interested in depression. For the study, the authors randomly selected 5000 posts from 24 online communities and identified five subgroups of online communities: Depression, Bipolar Disorder, Self-Harm, Grief, and Suicide. To identify these communities’ psycholinguistic features and content topics were extracted and analyzed. This paper also implemented ML techniques to build a classifier for depression vs other subgroups. There are certain aspects which I didn’t like about this paper like, the authors used a small database and from an online forum. How did they handle the possible bias and how did they validate the authenticity of the posts? Do depressed people actually go online and discuss or look for solutions regarding their issues? Also, what remains unclear is the reason behind comparing depression with other subgroups. Aren’t those subgroups a part of depression? I feel a disconnect in terms of how the authors started by stating a problem and then diverging away from the same.

Apart from these points there are certain aspects which I liked about this paper like, Nyugen et. al, implemented and compared results from various classifiers and one future work which I can think of is this method/concept being used by psychiatrist to actually detect the type and severity of depression a person is suffering by analysing their posts or writing behaviour .

Reflection #12 – [04/05] – [Jamal A. Khan]

April 5, 2018 jamal93 Leave a comment

Felbo, Bjarke, et al. “Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm.”
Nguyen, Thin, et al. “Using linguistic and topic analysis to classify sub-groups of online depression communities.”

The first paper regarding emoji’s is an intriguing one. There are things i really like about the paper and things that i don’t. Starting with the things i like, the data-set size is massive, which these days is rare to see. The way the authors pre-process the text is somewhat questionable and might introduce artifacts, but given the massiveness of the dataset that shouldn’t be the case. I would dare say that the actual novelty of the paper is the processed data set and it probably is also the reason why the model performs very well.

Coming onto the next logical step i.e. the model architecture, I feel that there is nothing novel here, Bi-directional LSTMs and then an attention layer or two on top isn’t new. Furthermore the explanation of the model isn’t clear. Are the LSTMs stacked? or are they in sequence? if they are in sequence then why (this isn’t a seq-seq i.e. encoder-decoder model)? Also it doesn’t make sense to have 2 LSTMs in sequence because the same could be achieved by replacing the existing ones with new one such that it has recurrence of both of previous ones combined. I realize this might be an involved question but I would like to know if someone else in the class understood this part because I most certainly didn’t.

Now, the authors claim that this “chain-thaw” transfer learning is novel and my opinion of this claim may be polarizing. The ethics of the claim are something i would like to discuss in class as well. To me chain-thaw is not anything new or innovative and it’s something i’ve already done in the ML course I took at VT. The reason why I say it’s not novel isn’t because i was able to come up with it on my own but because it is something so commonplace that people consider it trivial. The ability to freeze layers and re-train the layer of choice has been present in Keras (a deep learning API) since it’s inception which dates back to mid 2015. Has anyone else claimed this “chain-thawing” as their own? probably not. Does that make the authors the first one to claim it? probably yes. It is a contribution to the scientific community in any way? probably not 🙁 . Hence, this long rant brings me to my actual question, is claiming something as novel when the technique/information is common sense/trivial academically misleading or a false claim? To me it seems that the claim was made as a means to give the give the paper more selling power which it didn’t need because it was a good paper to begin with.

So, while reading this paper i ran into this random reddit-like snippet. Might be a tab bit …….. NSFW

Since the words themselves are so close to each other (talking about vector space embedding from the first layer of the network), would the network be able to extract these “connotations”? These connotations might exist in emojis as well, from personal usage and experience, I believe they do.

A direct follow up question that this paper raises is that can sentences that are purely written in emoji (no text) be translated to normal text or can meaning be inferred from them instead of just emotion? I think using the pretrained model an encoder can be readily built, but the decoder may be a whole different animal! primarily because of the lack of translated sentences i.e. a curated translation dataset. For folks who are new to seq-seq/encoder-decoder models, i would recommend reading up on NMT, this repo is a good primer and has practicle examples+code.

The second paper focuses on a different topic as compared to the first. Outright i would like to raise the question that why does the paper use such an old dataset? Reddit should have depression related reddits and subreddits, infact a quick google search shows some already.

A crawler (using the results from google as the seeds) and a scraper to pull the reddits could’ve proved to be a very effective approach in building a much more recent and perhaps representative dataset. Another confusing aspect was the choice of the 5 categories depression, self-harm, grief, bipolar disorder, and suicide. Why these 5?wasn’t the original goal to study depression? so why didn’t they focus on types of depression instead of the categories listed above?

The classification methodology chosen for the paper is questionable. Instead of a multi class classifier which would be able to classify into depression, self-harm, grief, bipolar disorder, and suicide, the authors have chosen to build 4 binary classifiers, why? it’s very counter intuitive, perhaps i’m missing something. Also, since i’m not knowledgeable about mental health problems, how would one go about labeling examples in the data-set? as compared to physical diseases/problems e.g. fractures in bones, is there a universally agreed upon classification or is that different practitioners would label differently? the labeling may completely change the models developed.

Another weakness is that the paper implicitly assumes that the depressed people post to forums. Do they post? or do they disappear? I guess that could be a research topic on it’s own. Overall the paper’s idea was pretty good but poorly explained and executed. I feel that the paper had much more potential.

Reflection #12 – 04/05 – Pratik Anand

April 4, 2018April 4, 2018 pratik Leave a comment

1) Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm

The paper aims to use the emojis to detect diverse set of emotions in text. The authors acknowledge the previous work in this field with the use of postive and negative emojis for finding the emotion of the text. They extend the previous work by having a more diverse set of classifications. A good example is emotional distinction between “this is shit” and “this is the shit”. The former has a negative connotation whereas the latter has a strong positive connotation. Their model, DeepEmoji, is a variation of LSTM model. The authors used Twitter dataset with emojis for training their model. For tweets with multiple emojis, the authors use the same tweet separately for each emoji in their dataset. Sometimes, a group of emojis collectively convey a message. Won’t such messaging be lost if the emojis are analyzed separately rather than a group? It indeed has a problem of distinction of such cases from cases where different emojis have no relation among each other. There is an attempt later in the paper to cluster emojis together into different groups. I believe that work can be extended to address the point mentioned above. Can a similar approach be applied to memes ?

Overall, it is a unique technical paper which even has a live demo available at deepmoji.mit.edu which one can play around. It is good to see work in emojis as they are the future of human communication.

2) Using linguistic and topic analysis to classify sub-groups of online depression communities

The paper uses linguistic features to identify sub-communities in the online depression communities. The paper, despite being fairly recent, uses Live Journal dataset for its studies. They could have gotten much better results with more popular communities like reddit, facebook groups datasets. Live Journal as a blogging website is on decline since 2010s. Nevertheless, if the dataset is representative, it is good enough. The authors identified various communities for depressed people which they grouped into the categories like Depression, Self-Harm, Grief/Bereavement, Bipolar Disorder and Suicide. What is the reasoning behind such cateogorisation ?
The authors use linguistic features as well as topic modelling to extract feature sets. The word clouds from topics provided some intersting keywords. An interesting fact is that no unique identifiers were found for Depression community except for filler words. What could be the reason for such behavior ? Can it be changed by different classification methodology? What should be the ground truth in cases related to mental health?
The authors posit that the latent features are representative of the sub groups and can be used for identification of such sub-groups. Another point is that it could be used as a starting point and correlated with data from other social networks to create a mental health profile of an individual. Different linguistic profile can help in understanding such communities better.