Reflection #12 – 04/05 – [Jiameng Pu]

Using linguistic and topic analysis to classify sub-groups of online depression communities

The world is in the period of high incidence of different kinds of mental problem. The complex nature of depression, e.g., different presentation of depression among depressed people, makes it difficult for the treatment and prevention. Therefore, the paper focuses on online communities to explore depression based on more and more exchange of information, support, and advice online. Machine learning can help identify topics or issues relevant to those with depression and characterize the linguistic styles of depressed individuals. The paper utilizes machine learning techniques on linguistic features to identify sub-communities in the online depression communities.


They mentioned “In this work we have used Live Journal data as a single source of online networks. In fact, Live Journal users could also join other networking services, such as Facebook or Twitter”, which exactly corresponds to my concern — the dataset could be better and more up-to-date, like data from popular social media, e.g., reddit, twitter and facebook. In fact, communities like facebook has conducted such research to detect accounts that might be in depression to carry out necessary psychological treatment. However, Live Journal data is obviously old for this task. Another point I feel confused is the five subgroups of online communities were identified: Depression, Bipolar Disorder, Self-Harm, Grief/Bereavement, and Suicide.. I didn’t see the inner logic how these five subgroups can fully represent downhearted psycholinguistic features, for instance, what’s the difference between self-harm and suicide… What this paper impresses me is the comparison of different classifiers, from SVM to Lasso. In practice, I sometimes feel like researchers have to try different machine learning models, and it’s not that we can always correctly guess which would work better, even when you got some priori experience. I haven’t used LIWC so far, but it’s apparently one of the most widely used tools in all the paper we’ve read. Look forward to trying it out in my future research…

Leave a Reply

Your email address will not be published.