Mitra, Tanushree, and Eric Gilbert. “The language that gets people to give: Phrases that predict success on Kickstarter.”
Summary:
In this paper, Mitra et al present a study to answer a research question that how does language used in pitch gets people to fund the project. The authors provide analysis of text from 45k Kickstarter project pitches. The authors clean the text from these pitches to use the phrases available in all 13 categories and finally use 20K phrases from these pitches along with 59 control variables such as project goal, duration, number of pledge levels etc. to train a penalized logistic regression model to predict if the project will be funded or not. Using phrases in model decreases the error rate from 17.03% to 2.4%, which shows that the text of the project pitches plays a vital role in getting funded. The paper compares the features of funded and non-funded projects and explains that the campaigns that show reciprocity (giving something in return), scarcity (limited availability) and social proof have higher tendency of getting funding.
Reflection:
The authors address a question about what features or language helps in getting more funds. The insights that paper provides are very realistic that people generally tend to give if they see benefit for themselves may be they get something in return. The paper provides a very useful insight to startups looking for funding that they should focus more on their pitch and show reciprocity, scarcity and social proof. But still the results of paper are somewhat astonishing to me because the first 100 predictors belong to language of pitch, which makes me question that is language sufficient to predict whether project will be funded?
There are also few phrases that do not make sense when taken out of context for example ‘trash’ has a very high beta score but does it make sense? Unless we look at entire sentence we cannot say that.
The authors show that the use of phrases in model significantly decreases the error rates but the choice of model is not evident. Why have they used penalized logistic regression? Even though penalized logistic regression (LASSO) makes sense but comparison with other models should have been provided. The ensemble methods like Random Forest Classifier should work well on this type of data and therefore the comparison of different models tested would have provided more insight to choice of model.
Furthermore, treating every campaign equally is another false assumption I see in this paper because how can a product asking for $1M and meeting its goals equivalent to a product with $1000 goal and is every category of campaign equivalent?
Finally, this paper was about the language used in pitches but it also presents new research questions, such as, is there a difference between types of people funding different projects? Do most people belong to wealthy societies? Another interesting question would be, can we process text within video pitches to perform similar analysis? Do infographics help? And, can we measure usefulness of a product and use it to predict?
Questions:
Is language sufficient to predict whether project will be funded?
Why the use of penalized logistic regression over other models?
Is every category of campaign equivalent?
Is there a difference between types of people funding different projects?
Can we process text within video pitches to perform similar analysis?
Can we measure usefulness of a product and use it to predict?