One of my research topics that I’m most passionate about is on developing machine learning algorithms that detect cyberbullying in social media. Cyberbullying is a serious public health threat that is detrimentally shaping the online experience. And while Internet technology is rapidly amplifying our ability to communicate, it’s important to develop complementary technology to help mitigate the harm of such detrimental communication.
Computer programs that detect online harassment could allow automatic interventions, e.g., providing advice to those involved, but we don’t yet have machine learning algorithms that can handle the scale, the structure, and the rapidly changing nature of cyberbullying. Standard approaches for classification are hindered by the cost of labeling bullying examples and the need for social context to differentiate bullying from other less harmful behavior.
My group is developing machine learning algorithms that use weak supervision, where the input to the algorithm isn’t whether each interaction is bullying, but general indicators of bullying, such as offensive language. The algorithms try to extrapolate from that using social media data, considering who’s sending and receiving messages with the provided indicators, and the overall structure of the relationships in the data. The algorithms do collective, data-driven discovery of who is bullying, who’s being bullied, and what additional vocabulary is indicative of bullying.
The following Great Innovative Idea is from Bert Huang, Assistant Professor of Computer Science at Virginia Tech. Huang presented his poster, Weakly Supervised Cyberbullying Detection in Social Media, at the CCC Symposium on Computing Research, May 9-10, 2016.