This paper examined a dataset of more than 45,000 Kickstarter projects to determine what properties make a successful Kickstarter campaign (in this case, defining success as driving sufficient crowd financial interest to meet the project’s funding goal). More specifically, the authors used both quantitative control variables (e.g., project duration, video present, number of updates) as predictors, as well as scraping the language in each project’s home page for common phrases. By combining both components, the authors created a penalized logistic regression model that could predict whether or not a project would be successfully funded with only a 2.24% error rate. The authors extended their discussion of the phrases to common persuasive techniques from literature such as reciprocity and scarcity to better explain the persuasive power of some phrases.
I thought that one of the most useful part of this paper relative to the upcoming course project was the collection of descriptions and uses of the tools used by the authors. Should my group’s course project attempt something similar, it is nice to know about the existence of tools such as Beautiful Soup, cv.glmnet, LIWC, and Many Eyes for data collection, preprocessing, analysis, and presentation. Other techniques such as Bonferroni correction and data repositories like the Google 1T corpus could also come in handy, and it is nice to know that they exist. Has anyone else in the class ever used any of these tools? Are the straightforward and user-friendly, or a nightmare to work with?
The authors aimed to find phrases that were common across all Kickstarter projects, and so they eliminated phrases that did not appear in all 13 project categories. As a result, phrases such as “game credits” and “our menu” were removed from the Games and Food categories respectively. I can certainly understand this approach for an initial study into Kickstarter funding phraseology, but I would be curious to see if any of these specific phrases (or lack of them) were strong predictors of funding within each category. I would speculate that a lack of phrases related to menus would be harmful to a funding goal in the Food category. There might even be some common predictors that are shared across a subset of the 13 project categories; it would be interesting to see if phrases in the Film & Video and Photography categories were shared, or Music and Dance for another example. How do you think some of the results from this study might have changed if the filtering steps were more or less restrictive?
Even after taking machine learning and data analytics classes, I still treat the outputs of many machine learning models as computational magic. As I glanced through Tables 3 and 4, a number of phrases surprised me in each group. For example, the phrase “trash” was an indicator that a project was more likely to be funded, while “hand made by” was an indicator that a project would not be funded. I would have expected each of these phrases to fall into the other group. Further, I noted that very similar phrases also existed across categories: “funding will help” indicated funding, whereas “campaign will help” indicated non-funding. Did anyone else notice unexpected phrases that intuitively felt like they were placed in the wrong group? Does the principle of keeping data in context that we discussed last week come into play here? Similarly, I thought that the Authority persuasive argument went counter to my own feelings. I would tend to view phrases like “project will be” as cocky and therefore would have a negative reaction to them, rather than treating them as expert opinions. Of course, that’s just my own view, and I’d have to read the referenced works to better understand the argument in the other direction.
I suspect that this paper didn’t get as much attention as Google Flu Trends (no offense, professor), but I’m curious to know if the phrasing in Kickstarter projects changed after this work was published. Perhaps this could be an interesting follow-up study; have Kickstarter creators become more likely to use phrases that indicated funding and less likely to use phrases that indicated non-funding after the paper and datasets were released? Another interesting follow-up study was hinted at in the Future Work section. Since Kickstarter projects can be tied to Facebook, and because “Facebook Connected” was a positive predictor of a project being funded, a researcher could explore the methods by which these Kickstarter projects are disseminated via social media. Are projects more likely to be funded based on number of posts? Quality of posts (or phrasing in posts)? The number of Facebook profiles that see a post related to the project? That interact with a post related to the project?