{"id":520,"date":"2018-04-05T03:48:01","date_gmt":"2018-04-05T03:48:01","guid":{"rendered":"http:\/\/wordpress.cs.vt.edu\/cs6724spring18\/?p=520"},"modified":"2018-04-05T03:48:01","modified_gmt":"2018-04-05T03:48:01","slug":"reflection-12-04-05-jamal-a-khan","status":"publish","type":"post","link":"https:\/\/wordpress.cs.vt.edu\/cs6724spring18\/2018\/04\/05\/reflection-12-04-05-jamal-a-khan\/","title":{"rendered":"Reflection #12 \u2013 [04\/05] \u2013 [Jamal A. Khan]"},"content":{"rendered":"<ol>\n<li class=\"gs_citr\">Felbo, Bjarke, et al. <em><strong>\u201cUsing millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm.\u201d<\/strong><\/em>\n<div class=\"gs_citr\"><\/div>\n<\/li>\n<li class=\"gs_citr\">\n<div class=\"gs_citr\">Nguyen, Thin, et al. <em><strong>\u201cUsing linguistic and topic analysis to classify sub-groups of online depression communities.\u201d<\/strong><\/em><\/div>\n<\/li>\n<\/ol>\n<p>The first paper regarding emoji&#8217;s is an intriguing one. There are things i really like about the paper and things that i don&#8217;t. Starting with the things i like, the data-set size is <em><strong>massive<\/strong><\/em>, which these days is rare to see. The way the authors pre-process the text is somewhat questionable and might introduce artifacts, but given the massiveness of the dataset that shouldn&#8217;t be the case. I would dare say that the actual novelty of the paper is the processed data set and it probably is also the reason why the model performs very well.<\/p>\n<p>Coming onto the next logical step i.e. the model architecture,\u00a0 I feel that there is nothing novel here, Bi-directional LSTMs and then an attention layer or two on top isn&#8217;t new. Furthermore the explanation of the model isn&#8217;t clear. <strong>Are the LSTMs stacked? or are they in sequence? if they are in sequence then why (this isn&#8217;t a seq-seq i.e. encoder-decoder model)? <\/strong>Also it doesn&#8217;t make sense to have 2 LSTMs in sequence because the same could be achieved by replacing the existing ones with new one such that it has recurrence of both of previous ones combined. I realize this might be an involved question but I would like to know if someone else in the class understood this part because I most certainly didn&#8217;t.<\/p>\n<p>Now, the authors claim that this <strong>&#8220;chain-thaw&#8221;<\/strong> transfer learning is novel and my opinion of this claim may be polarizing. The ethics of the claim are something i would like to discuss in class as well. To me chain-thaw is not anything new or innovative and it&#8217;s something i&#8217;ve already done in the ML course I took at VT. The reason why I say it&#8217;s not novel isn&#8217;t because i was able to come up with it on my own but because it is something so commonplace that people consider it trivial. The ability to freeze layers and re-train the layer of choice has been present in Keras (a deep learning API) since it&#8217;s inception which dates back to mid 2015. Has anyone else claimed this &#8220;chain-thawing&#8221; as their own? probably not. Does that make the authors the first one to claim it? probably yes. It is a contribution to the scientific community in any way? probably not \ud83d\ude41 . Hence, this long rant brings me to my actual question, <strong>is claiming something as novel when the technique\/information is common sense\/trivial academically misleading or a false claim? <\/strong>To me it seems that the claim was made as a means to give the give the paper more selling power which it didn&#8217;t need because it was a good paper to begin with.<\/p>\n<p>So, while reading this paper i ran into this random reddit-like snippet. Might be a tab bit &#8230;&#8230;.. NSFW<\/p>\n<p style=\"text-align: center\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-521\" src=\"http:\/\/wordpress.cs.vt.edu\/cs6724spring18\/wp-content\/uploads\/sites\/51\/2018\/04\/connotations-218x300.png\" alt=\"\" width=\"218\" height=\"300\" srcset=\"https:\/\/wordpress.cs.vt.edu\/cs6724spring18\/wp-content\/uploads\/sites\/51\/2018\/04\/connotations-218x300.png 218w, https:\/\/wordpress.cs.vt.edu\/cs6724spring18\/wp-content\/uploads\/sites\/51\/2018\/04\/connotations.png 500w\" sizes=\"auto, (max-width: 218px) 100vw, 218px\" \/><\/p>\n<p><strong>Since the words themselves are so close to each other (talking about vector space embedding from the first layer of the network), would the network be able to extract these &#8220;<em>connotations&#8221;?<\/em><\/strong> These connotations might exist in emojis as well, from personal usage and experience, I believe they do.<\/p>\n<p>A direct follow up question that this paper raises is that <strong>can sentences that are purely written in emoji (no text) be translated to normal text or can meaning be inferred from them instead of just emotion? <\/strong>I think using the pretrained model an encoder can be readily built, but the decoder may be a whole different animal! primarily because of the lack of translated sentences i.e. a curated translation dataset. For folks who are new to seq-seq\/encoder-decoder models, i would recommend reading up on NMT, <strong><a href=\"https:\/\/github.com\/tensorflow\/nmt\">this repo<\/a><\/strong> is a good primer and has practicle examples+code.<\/p>\n<p>&nbsp;<\/p>\n<p>The second paper focuses on a different topic as compared to the first. Outright i would like to raise the question that <strong>why does the paper use such an old dataset?<\/strong> Reddit should have depression related reddits and subreddits, infact a quick google search shows some already.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-523 aligncenter\" src=\"http:\/\/wordpress.cs.vt.edu\/cs6724spring18\/wp-content\/uploads\/sites\/51\/2018\/04\/depress-300x271.png\" alt=\"\" width=\"300\" height=\"271\" srcset=\"https:\/\/wordpress.cs.vt.edu\/cs6724spring18\/wp-content\/uploads\/sites\/51\/2018\/04\/depress-300x271.png 300w, https:\/\/wordpress.cs.vt.edu\/cs6724spring18\/wp-content\/uploads\/sites\/51\/2018\/04\/depress-768x694.png 768w, https:\/\/wordpress.cs.vt.edu\/cs6724spring18\/wp-content\/uploads\/sites\/51\/2018\/04\/depress.png 982w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<p>A crawler (using the results from google as the seeds) and a scraper to pull the reddits could&#8217;ve proved to be a very effective approach in building a much more recent and perhaps representative dataset.\u00a0 Another confusing aspect was the choice of the 5 categories depression,\u00a0 self-harm, grief,\u00a0bipolar disorder, and suicide. <strong>Why these 5?wasn&#8217;t the original goal to study depression? so why didn&#8217;t they focus on types of depression instead of the categories listed above? <\/strong><\/p>\n<p>The classification methodology chosen for the paper is questionable. <strong>Instead of a multi class classifier which would be able to classify into depression,\u00a0 self-harm, grief,\u00a0bipolar disorder, and suicide, the authors have chosen to build 4 binary classifiers, why?<\/strong> it&#8217;s very counter\u00a0 intuitive, perhaps i&#8217;m missing something. Also, since i&#8217;m not knowledgeable about mental health problems, <strong>how would one go about labeling examples in the data-set? as compared to physical diseases\/problems e.g. fractures in bones, is there a universally agreed upon classification or is that different practitioners would label differently?<\/strong> the\u00a0 labeling may completely change the models developed.<\/p>\n<p>Another weakness is that the paper implicitly assumes that the depressed people post to forums. <strong>Do they post? or do they disappear?<\/strong> I guess that could be a research topic on it&#8217;s\u00a0 own. Overall the paper&#8217;s idea was pretty good but poorly explained and executed. I feel that\u00a0 the paper had much more potential.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Felbo, Bjarke, et al. \u201cUsing millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm.\u201d Nguyen, Thin, et al. \u201cUsing linguistic and topic analysis to classify sub-groups of online depression communities.\u201d The first paper regarding emoji&#8217;s is an intriguing one. There are things i really like about the paper and things [&hellip;]<\/p>\n","protected":false},"author":125,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-520","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/wordpress.cs.vt.edu\/cs6724spring18\/wp-json\/wp\/v2\/posts\/520","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wordpress.cs.vt.edu\/cs6724spring18\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wordpress.cs.vt.edu\/cs6724spring18\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wordpress.cs.vt.edu\/cs6724spring18\/wp-json\/wp\/v2\/users\/125"}],"replies":[{"embeddable":true,"href":"https:\/\/wordpress.cs.vt.edu\/cs6724spring18\/wp-json\/wp\/v2\/comments?post=520"}],"version-history":[{"count":4,"href":"https:\/\/wordpress.cs.vt.edu\/cs6724spring18\/wp-json\/wp\/v2\/posts\/520\/revisions"}],"predecessor-version":[{"id":526,"href":"https:\/\/wordpress.cs.vt.edu\/cs6724spring18\/wp-json\/wp\/v2\/posts\/520\/revisions\/526"}],"wp:attachment":[{"href":"https:\/\/wordpress.cs.vt.edu\/cs6724spring18\/wp-json\/wp\/v2\/media?parent=520"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wordpress.cs.vt.edu\/cs6724spring18\/wp-json\/wp\/v2\/categories?post=520"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wordpress.cs.vt.edu\/cs6724spring18\/wp-json\/wp\/v2\/tags?post=520"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}