Reflection #12 – [04/05] – [Jamal A. Khan]

  1. Felbo, Bjarke, et al. “Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm.”
  2. Nguyen, Thin, et al. “Using linguistic and topic analysis to classify sub-groups of online depression communities.”

The first paper regarding emoji’s is an intriguing one. There are things i really like about the paper and things that i don’t. Starting with the things i like, the data-set size is massive, which these days is rare to see. The way the authors pre-process the text is somewhat questionable and might introduce artifacts, but given the massiveness of the dataset that shouldn’t be the case. I would dare say that the actual novelty of the paper is the processed data set and it probably is also the reason why the model performs very well.

Coming onto the next logical step i.e. the model architecture,  I feel that there is nothing novel here, Bi-directional LSTMs and then an attention layer or two on top isn’t new. Furthermore the explanation of the model isn’t clear. Are the LSTMs stacked? or are they in sequence? if they are in sequence then why (this isn’t a seq-seq i.e. encoder-decoder model)? Also it doesn’t make sense to have 2 LSTMs in sequence because the same could be achieved by replacing the existing ones with new one such that it has recurrence of both of previous ones combined. I realize this might be an involved question but I would like to know if someone else in the class understood this part because I most certainly didn’t.

Now, the authors claim that this “chain-thaw” transfer learning is novel and my opinion of this claim may be polarizing. The ethics of the claim are something i would like to discuss in class as well. To me chain-thaw is not anything new or innovative and it’s something i’ve already done in the ML course I took at VT. The reason why I say it’s not novel isn’t because i was able to come up with it on my own but because it is something so commonplace that people consider it trivial. The ability to freeze layers and re-train the layer of choice has been present in Keras (a deep learning API) since it’s inception which dates back to mid 2015. Has anyone else claimed this “chain-thawing” as their own? probably not. Does that make the authors the first one to claim it? probably yes. It is a contribution to the scientific community in any way? probably not 🙁 . Hence, this long rant brings me to my actual question, is claiming something as novel when the technique/information is common sense/trivial academically misleading or a false claim? To me it seems that the claim was made as a means to give the give the paper more selling power which it didn’t need because it was a good paper to begin with.

So, while reading this paper i ran into this random reddit-like snippet. Might be a tab bit …….. NSFW

Since the words themselves are so close to each other (talking about vector space embedding from the first layer of the network), would the network be able to extract these “connotations”? These connotations might exist in emojis as well, from personal usage and experience, I believe they do.

A direct follow up question that this paper raises is that can sentences that are purely written in emoji (no text) be translated to normal text or can meaning be inferred from them instead of just emotion? I think using the pretrained model an encoder can be readily built, but the decoder may be a whole different animal! primarily because of the lack of translated sentences i.e. a curated translation dataset. For folks who are new to seq-seq/encoder-decoder models, i would recommend reading up on NMT, this repo is a good primer and has practicle examples+code.

 

The second paper focuses on a different topic as compared to the first. Outright i would like to raise the question that why does the paper use such an old dataset? Reddit should have depression related reddits and subreddits, infact a quick google search shows some already.

A crawler (using the results from google as the seeds) and a scraper to pull the reddits could’ve proved to be a very effective approach in building a much more recent and perhaps representative dataset.  Another confusing aspect was the choice of the 5 categories depression,  self-harm, grief, bipolar disorder, and suicide. Why these 5?wasn’t the original goal to study depression? so why didn’t they focus on types of depression instead of the categories listed above?

The classification methodology chosen for the paper is questionable. Instead of a multi class classifier which would be able to classify into depression,  self-harm, grief, bipolar disorder, and suicide, the authors have chosen to build 4 binary classifiers, why? it’s very counter  intuitive, perhaps i’m missing something. Also, since i’m not knowledgeable about mental health problems, how would one go about labeling examples in the data-set? as compared to physical diseases/problems e.g. fractures in bones, is there a universally agreed upon classification or is that different practitioners would label differently? the  labeling may completely change the models developed.

Another weakness is that the paper implicitly assumes that the depressed people post to forums. Do they post? or do they disappear? I guess that could be a research topic on it’s  own. Overall the paper’s idea was pretty good but poorly explained and executed. I feel that  the paper had much more potential.

 

Leave a Reply

Your email address will not be published. Required fields are marked *