ashish | CS6724 Spring18: Computational Social Science

Mitra, Tanushree, and Eric Gilbert. “The language that gets people to give: Phrases that predict success on Kickstarter.” Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing. ACM, 2014.

Summary

The language that gets people to give is a fascinating study that attempts to answer two questions – can we predict what crowdfunding campaigns get funded, and what features of the campaign determine its success. Analyzing over 45K Kickstarter campaigns, Mitra et al. build a penalized regression model with 59 control features such as project goal, duration, number of pledge levels etc. Using this as a baseline, they built another model with textual features extracted from the project description. To ensure generalizability, they only chose words and phrases that appear in all 13 categories of Kickstarter campaigns. The control-only model has an error rate of roughly 17%. The use of language features (~20K phrases) reduces the error rate to 2.4%, indicating a non-random increase in accuracy. The paper then relates the top features for both the funded and not-funded cases to social psychology and the theories of persuasion. Of these, campaigns that display reciprocity (the tendency to return a favor), scarcity (limited availability of the product), social proof (others like the product too), authority (an expert designing or praising the product) or sentiment (how positive is the description) tend to be funded more.

Reflection

An exciting aspect of this paper is the marriage of social psychology, statistical modeling, and natural language processing. The authors address a challenging question about what features, language or otherwise, encourage users to invest in a campaign. The paper borrows heavily from theories of persuasion to describe the effects of certain linguistic features. While project features like the number of pledge levels are positively correlated with increased chances of funding, I was surprised to see phrases such as “used in a” or “project will be” influencing successful funding. I am equally interested in how these phrases relate to specific aspects of persuasion – in this case, reciprocity and liking/authority. The same phrases can be used in different contexts to imply different meanings. I am curious to know if the subjectivity index [1] of project descriptions make any contribution to a fund or no-fund decision.

I would expect that another important aspect of successful campaigns would be the usefulness of a product to the average user. While this is hard to measure objectively, I was surprised to find no reference to this in any of the top predictors. Substantial research in sales and marketing seems to indicate a growing emphasis on product design for successful marketing campaigns [2].

A final aspect that I find intriguing is the deliberate choice of treating all products equally on Kickstarter. How valid is this assumption when one considers funding a documentary vs. earphones? It is likely that one might focus much more on contents and vivid descriptions, while the other might focus more on technical features and benchmarks?

The paper throws open the entire field of social psychology and offers a great starting point for me to read and understand the interplay of psychology and linguistics.

Questions

Do different categories of campaigns experience different funding patterns?
- Are certain types of projects more likely to be funded as compared to others?
While social psychology is an important aspect of successful campaigns, perhaps it would make sense only in conjecture with what the product really is?

[1] Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (2005). Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. Proc. of HLT-EMNLP-2005.

[2] Why the Product is the Most Important Part of the Marketing Mix. http://bxtvisuals.com/product-important-part-marketing-mix/

Boyd, Danah, and Kate Crawford. “Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon.” Information, communication & society 15.5 (2012): 662-679.

Lazer, David, and Jason Radford. “Data ex Machina: Introduction to Big Data.” Annual Review of Sociology 0 (2017).

Summary

Computational social science is still in very nascent stages. Until recently, research in this domain was conducted by computationally minded social scientists or socially inclined computer scientists. The latter has been driven by the emergence of the Internet and social media platforms, like Facebook, Twitter or Reddit. The digitization of our social life presents itself as Big Data. The two papers present a view of Big Data – its collection and analysis, from a social sciences perspective. Social scientists relied on surveys to collect data, but had to continually question the biases in people’s answers. Both papers value Big Data’s contributing in studying the behavior of a populace, without the need for surveys. The challenge in Big Data is oriented more towards organizing and categorizing “data” rather than its size. The main focus of these two papers, however, is less on Big Data, but on the vulnerabilities in such an approach.

Danah Boyd’s paper raises six concerns about the use of Big Data in social sciences. They argue that not all questions in social science can be answered using numbers – the whys need a deeper understanding of social behavior. More importantly, the mere scale of Big Data does not automatically make it more reliable – there still might be inherent biases present. Finally, it raises concerns about data handling and access – who owns digital data and how limited access can create similar divides in society as education.

Lazer’s paper raises almost identical vulnerabilities with Big Data. As he very aptly puts –

The scale of big data sets creates the illusion that they contain all relevant information on all relevant people.

Lazer repeatedly stresses that generalizability of social science research does not necessarily need scale, but more representative data. He also iterates Boyd’s concern about research ethics on social platforms. Lazer ends with future trends for computational social science that include – more generalized models, qualitative analysis of Big Data and multimodal sources of data.

Reflections

As a student and researcher in computer science, it is imporant to remember constantly that computational social science is not an extension of data science or machine learning. We often fall into the trap of thinking about methods before questions. It is best to think of computational social science a derivative of social science rather than computer science.

I enjoyed reading Boyd’s provocations #3 and #4. Big Data is often heralded as a messiah that can answer important behavioral questions about the society. Even if this were true, this will not be because of the scale of the data, but the ability to process and analyse this data. As researchers, it feels increasingly important to consider multiple sources and ask broader questions in social science than ones like – “will a user make a purchase?”. For this, one can’t merely look for patterns in data, but study why the patterns exist. Are these patterns because of the dataset in question, or is it reflective of true societal behavior? While the two papers mention a trend of generalization, especially in the machine learning field, I also see a trend where there is increased specialization. Methods in computer science have decreased applicability to a large enough data sample.

Finally, a major concern about Big Data is privacy and ethics. Unlike with challenges in data analysis, this concern does not have any correct answers. Universities, labs and industries will have to work more closely with IRBs to develop a good framework for the governance of Big Data.

Questions

To re-iterate, what are the big questions in social science, and where do we draw a balance between quantitative and qualitative analysis?
While computational social science helps debunking incorrect beliefs about group behavior, can it truly help understand the cause of such behavior?
Should we change societal behavior, if it were possible?
- While predictive analysis is non-intrusive, what constitutes ethical behavior when social science has more intrusive effects?
Finally, does research in computational social science encourage large-scale surveillance?

Author: ashish

Reflection #2 – [1/23] – [Ashish Baghudana]

Summary

Reflection

Questions

Reflection #1 – [1/18] – [Ashish Baghudana]

Summary

Reflections

Questions