Reflection #2 – [1/23] – Md Momen Bhuiyan

Paper:- The Language that Gets People to Give: Phrases that Predict Success on Kickstarter

Summary:
This paper looks into the language usage and its predictive power of getting funded in a Crowdfunding site, Kickstarter. Previous studies already found several factors that affect funding probability in such sites, like higher funding goal, longer project duration, video in a project pitch, social network, key attributes of the project etc. This study builds on that by adding linguistic analysis of the project pitch. The authors apply the unigram, bigram, and trigram phrases common in all 13 categories of projects as linguistic predictive variables along with 59 other Kickstarter variable into a penalized logistic regression classifier. Finally, authors do both qualitative and quantitative analysis of the result of the output. Authors provide top 100 phrases that contributed to their algorithm. From the qualitative analysis, several intuitive phenomena appear like reciprocity and scarcity have positive correlation with being funded. Several other factors like social identity, social proof, and authority also seem to contribute to the process. LIWC analysis suggests that funded project pitch includes higher cognitive process, social process, perception rates etc. Although sentiment analysis shows that funded projects have higher positive and negative sentiment, it is not statistically significant. One interesting phenomenon found in the analysis was that a completely new project is likely to have less success than one that builds on a previous one.

Reflection:
The paper provides a good motivation for the analysis of linguistic features in Crowdfunding projects. The authors use of Penalized logistic regression seemed interesting to me. I would have probably thought of applying PCA first and then doing a logistic regression. But Penalized logistic regression provided results which were important for interpretation. At the same time, looking into the top 100 positively and negatively correlated terms reminds about the fault of big data interpretation: “seeing correlation where none exists”. For example, “Christina”,”cats” etc. For the sake of generalizability, authors lose many terms which could have better correlation with projects. But the beta score of the 29 control variables says otherwise. Authors’ use of Google’s 1T corpus for reducing the number of phrases and tree visualization of some common terms were nice additions to the paper. Although positively correlated terms don’t guarantee success in a Crowdfunding project, negatively correlated terms provide a list of things to avoid in the project pitch which is very useful. The social proof attribute of the result begs the question, can we manipulate the system by faking backers?

Read More

Reflection #2 – [01-23] – [Patrick Sullivan]

The Language that Gets People to Give: Phrases that Predict Success on Kickstarter
Mitra and Gilbert are investigating how and why certain crowd funding projects succeed or fail.

One of the first interesting points I found were what metrics of a project were targeted for analysis. Mitra and Gilbert look specifically at accurately predicting project success (funded or not funded). I feel that while this is a very important metric to be predicting, there are others that could be useful in making a social change. If a project is generating more attention outside of Kickstarter, the sheer number of project views may give a less-than-stellar project more funding than an incredible project that is seen by few people. Mitra and Gilbert do find counterexamples to this (Ninja Baseball), but I believe that it is intuitive that projects with more visibility are seen by more potential funders, and thus receive more funding. Maybe a metric such as ‘average funding per project viewer‘ would give a better insight into the general qualities of a successful Kickstarter project. This metric would show a stark contrast between one project that makes 1% of viewers into backers and another project where 30% of viewers become backers. Outside media influence and advertising may significantly alter the outcomes of these projects, so page views are one way of researching another factor of crowd funding success. However, measurements like this might not be collectible if Kickstarter refuses to release project viewership and specific funding statistics.

There is a large incentive for malicious users to create fake projects that become funded, since it can be used as a source of revenue. While Kickstarter’s quality control will work against these projects, they may still be impacting the data collected in this research. It can be quite difficult to determine realism from digital content, which is the majority of the communication and information that is shown on a Kickstarter page. Verifying dubious claims in crowd funding projects can be difficult for those without high levels of technical knowledge, leading to the growth of content that ‘debunks’ these claims (e.g. ‘Captain Disillusion’ and ‘ElectroBOOM’ Youtube channels). It would be extremely difficult for a machine to differentiate real Kickstarter projects with novel concepts from malevolent projects that are created to fool wary human viewers. It is not clear if Mitra and Gilbert foresaw this possible issue and took steps to avoid it. Although there are some natural social protections from these fake projects since the more lucrative projects have a larger audience, and thus more scrutiny from the public. Kickstarter’s all-or-nothing project funding outlook is another natural defense, but not all crowd funding platforms share this. I expect that similar research on other platforms could show some radically different results.

Another route of expanding this research could be through investigating how culture affects crowd funding. Capitalism plays a close role to how crowd funding is currently structured in the USA, so cultures and societies that support alternative outlooks may show desire towards very different looking projects. Nearly all of the factors Mitra and Gilbert discussed (reciprocity, scarcity, authority, etc…) are connected to specific human values or motivations. So exploring how crowd funding success can be predicted among audiences with varying levels of materialism, empathy, and collectivism could show how to raise funding for projects that benefit other cultures as well.

Read More

Reflection #2 – [1/23] – Aparna Gupta

Paper: The Language that Gets People to Give: Phrases that Predict Success on Kickstarter.

Summary:

The paper talks about the Crowdfunding websites like Kickstarter where entrepreneurs and artists look for the internet for funding. This paper explores the factors which lead to successful funding a crowdfunding project and looks to answer the question – “What makes a project succeed in order to get funded”. The presented work focuses more on the predictive power of content, and more precisely the words and phrases project creators use to pitch their projects. The authors have analyzed 45K Kickstarter projects. To ensure generalization, they have used phrases which occurred >50 times in all the projects under consideration. The paper concludes with the citing that projects which shows reciprocity, Scarcity, Social Proof, Authority, Social Identity and Liking are more likely to get funded.

Reflection:

The paper gives a hang of some of the important phrases which can determine the probability of getting successfully funded on Kickstarter. However, the question which stuck my mind is “Can these phrases be generalized across various genres?”. The paper states that by analyzing the project content and most commonly occurring phrases one can understand the social reaction of an individual. However, I feel that this can be biased, in the sense that the reasoning behind specific reactions cannot be known. It might happen that a project does not get the funding because the person listening to the pitch does not have interest in the field. How should reactions(maybe biased) be interpreted or taken into consideration?

The paper lists the factors like –  project goal, duration, category, the presence of a video etc. which plays a significant role in predicting whether a project will get funded. I agree to these since presenting a video can explain a concept better. Visualizations expedite the understanding process of the viewers. However, I am curious to understand if “what the product is about and how useful it’ll be in future, can also be served as a feature in determining the getting funded status of a project

The statistical analyses explained in the paper depicts the amalgamation of modeling and sociology. The authors have used ‘LASSO’ to determine the feature importance. Can other statistical models be used as well, in this scenario? The modeling results, however, highlight phrases like ‘good karma’, ‘used in a’ etc, which were contained in the funded projects looked misplaced. The authors have also raised a similar question for a phrase like ‘cat’ being present in most projects which got funded. What intrigues me is: Although a lot of research has already been conducted to understand sociology using statistical modeling, there are still some facts about social behavior which are unexplored and difficult to understand.

This paper overall explores a challenging question of determining what features, language and English phrases compel people to invest in a project.

Read More

Reflection #2 – [01/23] – [Vartan Kesiz Abnousi]

Tanushree Mitra and Eric Gilbert. 2014. The language that gets people to give: phrases that predict success on kickstarter. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing (CSCW ’14). ACM, New York, NY, USA, 49-61. DOI=http://dx.doi.org/10.1145/2531602.2531656

 

Summary

The authors of this paper aim to find the type of language, used in crowdfunding projects, which leads to successful funding. The raw dataset is from the crowdfunding website Kickstarter. The raw data has 45K crowdfunded projects, analyzing 9M phrases and 59 other variables commonly present on crowdfunding sites between June to August 2012. The authors use a statistical model. The response variable is whether the project was funded or not. The predictive variables are partitioned into two broad categories. First, control variables such as project goal, project duration and others. Second, the predictive variables of interest, which are phrases scraped from the textual content from its Kickstarter homepage.  The statistical model the authors use is penalized logistic regression that aims to predict whether the project was funded. Moreover, the preferred model is LASSO on the grounds that it is parsimonious. The result of this model has about 2.41% of cross-validation error and 2.24% prediction error. The addition of the phrases decreases the predictive error by 15%. Subsequently, the authors find that phrases have a significant predictive power and proceed to rank the coefficients, “weights”, from highest to lowest.  Furthermore, they group the phrases under categories by using the Linguistic Inquiry and Word Count program (LIWC). Then they compare the non-zero β coefficient phrases to the Google 1T corpus data. They find a subset of 494 positive and 453 negative predictors by a series of statistical tests. Finally, authors discuss the theoretical implications of the results. They argue that phrases that indicate reciprocity, scarcity, social identity, liking and authority are more likely to be funded.

 

Reflection

This paper demonstrates the power of big data in dealing with research questions that researchers were not able to explore until a few years ago. Moreover, not only it analyzes a large amount of data, 9 million phrases, but it selects a subset and then groups them into meaningful categories. Finally, theories from social psychology are used to draw conclusions that could generalize the results. In addition, businesses that opt to crowdfunding could use more of these phrases to receive funding. Interestingly enough, most of the limitations that the authors mentioned are inherent to big data problems, as discussed in the previous lecture.

I find the “all or nothing” funding principle interesting. I think this should be highlighted, because it means that businesses should make sure that they choose their project goal and duration carefully to ensure funding. As the literature review suggests, projects with higher duration and goals are less likely to be funded. Both project goal and project duration are controlled in the model.

In addition, it should be noted that the projects belong to 13 distinct categories. It would be interesting to know the demographics of the people who fund the projects. This could answer a number of questions, such as whether every project is funded by a specific demographic category, or whether some phrases are more appealing to a specific demographic. Perhaps the businesses would prefer to have their funding from the same demographic category that they target as their future clients or customers.

Another information that could be interesting is to know how “concentrated” are the funds to a specific number of people. Was 90% of the funding for a given project from one person and the rest from hundreds of people? Furthermore, there is a heterogeneity in the sources of funding that has an effect on the dependent variable, whether it is funded or not, that is not captured.

The authors chose the LASSO because it is parsimonious. An additional advantage of using the LASSO model is that it gives us a narrower subset of non-zero coefficients for further analysis, since it works a model selection technique. For example, if ridge was used, the authors would have to analyze more phrases, most of which would probably not be important.  However, a problem with the penalized regression approaches is that there are problems in their interpretation. For instance, the coefficient of a classical logistic regression indicates the likelihood that a project can be funded or not, if a specific phrase is used, ceteris paribus. However, LASSO is still preferable than artificial neural networks, because the authors are not only interested in the predictive power of the model, but ultimately in interpreting the results. Perhaps using a decision tree approach would also be useful, because it also selects a subset of variables and allows for interpretations.

 

Questions

  • Would using other statistical models improve the performance the predictive performance?
  • Can we find information about the demographics of the people who fund the projects? Is there a way we can find the demographics of the donors? We could then link the phrases to demographics. For instance, are some phrases more effective based on the gender?

Read More

Reflection #2 – [1/23] – [Deepika Kishore Mulchandani]

[1]. Mitra, T. and Gilbert, E. 2014. The language that gets people to give. Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing – CSCW ’14.

Summary :

In this paper, the authors aim to answer the research question ‘What are the factors that drive people to fund projects on crowdfunding sites?’. To this end, Tanushree Mitra et al. studied a corpus of 45K projects from Kickstarter, a popular crowdfunding site. They carried out filtering techniques to eliminate bias and finally analyzed 9M phrases and 59 control variables to identify their predictive power in a project getting funded by the “crowd”. The error rate of their cross-validated penalized logistic regression model is only 2.4%. The authors found that the chances of funding increase if : the pitch offers incentives and rewards(reciprocity), the pitch has opportunities which are rare or limited in supply(scarcity),  the pitch’s wordings indicate that it is already pledged by others(social proof), the pitch is from a project creator that people like(liking), the pitch has a positive and confident language(LIWC and sentiment), and, the pitch is endorsed by experts(authority). By performing this research they have made available a ‘phrase and control variables’ dataset. This dataset contains phrases and control variables that can be put to further use by crowdfunding sites and other researchers.

Reflection:

‘The language that gets people to give’ is an engaging research paper. I admire the effort put by the authors in analyzing a corpus of 45K Kickstarter projects. The flowchart of the steps taken to extract the variables to be used in the model was helpful for understanding the process of obtaining the phrases and control variables finally analyzed. The fact that the control variables are not specific for the Kickstarter platform aids in making this research more useful for all crowdfunding platforms. I like the Word Tree visualizations that were provided by the author. The role that persuasion phrases and concepts like reciprocity, scarcity, authority, and,  sentiment play in getting a project funded were fascinating to read about. Features like ‘Video present’, ‘number of comments’ and ‘facebook connected’ emphasize the social aspects of this analysis. Few of the top 100 phrases listed in the paper surprised me, however, I could definitely spot the patterns that the authors identified. It is indeed impressive to see that a quantitative analysis using machine learning techniques can validate reciprocity, liking, and, scarcity, etc. I was amazed by the ‘good karma’ phrase. This phrase and its mention with respect to reciprocity made me realize that it would be exciting to study the crowdfunding projects to answer the question, ‘Do religious and spiritual beliefs impact the decision a person makes in funding a project? Do these beliefs hold more importance than the incentive rewards in the reciprocity phenomenon?’. On observing the tables listing the control variables having non-zero coefficients, I found that many of the variables in the not funded table were related to the ‘music’ and ‘film’ categories. This gave rise to the question ‘Do some beliefs (e.g. projects in these categories may not be successful) influence the decision of funding the project?  Do these beliefs way more than factors like reciprocity, authority, liking, etc.?’. I appreciate the ideas for future work that the authors have provided. I believe that implementing a feature like providing a recommendation to project creator while the pitch is being typed using the phrases and control variable dataset that the authors have released could be extremely interesting.

 

Read More

Reflection #2 – [1/23] – [Meghendra Singh]

Mitra, Tanushree, and Eric Gilbert. “The language that gets people to give: Phrases that predict success on kickstarter.” Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing. ACM, 2014.

In this paper, Mitra and Gilbert present an exciting analysis of text from 45K, Kickstarter project pitches. The authors describe a method to clean the text from the project pitches and subsequently use 20K phrases from these pitches along with 59 control variables to train a penalized logistic regression model and predict the funding outcomes for these projects. The results presented in the paper suggest that the text of the project pitches plays a big role in determining whether or not a project will meet its crowdfunding goal. Later on the authors discuss, how the top phrases (best predictors) can be understood using principles like, reciprocity, scarcity, social proof and social identity from the domain of social psychology.

The approach is interesting and suggests that the “persuasiveness” of language used to describe an artefact significantly impacts the prospects of it being bought. Since there have been close to 257K projects on Kickstarter, I feel now is a good time to validate the results presented in the paper. By validation, I mean to assess whether the top phrases that were discovered to predict a successfully funded project have appeared in the pitches of projects that met there funding goals since August 2012. Also, if this is true for projects that didn’t meet there funding goals. Additionally, repeating the study might be a good idea, as there are considerably more projects (i.e., more data to fit) and “hopefully” better modeling techniques (deep neural nets?). Repeating the study might also give us insights into how the predictor phrases have changed in the last 5 years.

A fundamental idea that might be explored is coming up with a robust quantitative measure of “persuasiveness” for any general block of text, maybe using linguistic features and common English phrases present in it. We can then explore if this “persuasiveness score” for a project’s pitch is a significant predictor of success for a crowdfunding project. Additionally, I feel that information about crowdfunded projects spreads similar to news, memes or contagions in a network. Aspects like homophily, word of mouth, celebrities and influencer networks may play a big role in bringing backers to a crowdfunding project and these phenomena belong to the realm of complex systems, having properties like, nonlinearity, emergence and feedback. I feel this makes the spread of information a stochastic process, and unless a “potential” backer gets informed about the existence of a project of interest to them, it is unlikely they would search through the thousands of active projects on all the crowdfunding websites. Also, it maybe possible for certain projects that most of the “potential” backers belong to certain social community, group or clique and the key to successful funding might be to propagate news about the project to these target communities (say on social media?). Another interesting research direction might be to mine backer networks from a social network. For example, how many friends, friends of friends, and so on, of a project backer also pledged to the project? It might also be useful to look at the project’s comments page and examine how the sentiment of these comments evolve over time? Is there a pattern to the evolution of these sentiments that correlate with project success or failure?

Another trend that I have noticed (e.g. in one of the Kickstarter projects I had backed) is that majority of the project’s pitch is present in the form of images and video. In such cases, how would a text only technique to predict the success of a project fair against one that also uses images and videos from the project as features? Can we use these obscure yet pervasive data to improve our classifier accuracy? The authors discussed in the “Design Implications” section about the potential applications of this work to help both backers and project creators. I feel that there only so much money available with the everyday joe, and even if all the crowdfunded projects have highly persuasive pitches, serendipity might determine which projects get successful, isn’t it?

Although, the paper does a good job of explaining much of the domain specific terms, there were a couple of places which were difficult for me to grasp. For example, is there a logic behind throwing away all phrases that occur less than 50 times in the 9M phrase corpus? I speculate that the 9M phrase corpus would have followed a power law distribution. In this case it might be interesting to experiment with the threshold for filtering the most frequent phrases in the corpus. Moreover, certain phrases like, nv (beta=1.88), il (beta=1.99) and nm (beta=-3.08) present in the top 100 phrases listed in tables 3 and 4 of the paper don’t really make sense (But the cats definitely do!). It might be interesting to trace the origins of these phrases and examine why are they such important predictors? Also, It maybe good to briefly discuss Bonferroni correction? Other than these issues, I enjoyed reading the paper and I especially liked the word tree visualizations.

Read More

Reflection #2 – [1/23] – [Jamal A. Khan]

Well let me start out by saying that this was a fun read and the very first takeaway is that I know now how to “take away money” from people!

Anyhow, moving on to a more serious note, since the prime motive of the paper is to analyze the language used to pitch products/ideas and since videos (or content thereof) are a good indicator of Funded or not Funded, what effect does implicit racial bias of the crowd have? More concretely:

  • What effect does the race of both the person pitching and the crowd have?
  • Do people tend to fund people of the same racial group more?

Another aspect that I would like to investigate is the crowd itself and its statistics per funded project and how it varies across them? Can we find some trend there?

The paper more or less gives evidence of the intuitive insights both from literature and ones based on common sense e.g. people/contributors don’t stand to make profit or reap monetary benefit from the project but given some form of “reciprocation”, there’s added incentive for them to contribute apart from them liking the project. Sometimes this takes the form of something tangible like a free t-shirt and at other times it’s merely a mention in credits but the important point is that people are getting something in return for for their funding. Another prominent one is “scarcity” i.e. the desire to have something that is unique limited to only a few people. Tapping into that emotion of exclusivity and adding in personalization is good way to securing some funding.

However, not all is well! As was also noticed by some other people, there are some spurious phrases in table 3 and 4 for which it seems that they should’ve belong to the other category e.g:

  • “trash” was in funded with beta = 2.75
  • “reusable” was in not funded beta = -2.53

There were also some phrases which made no sense to be in either category e.g. “girl and” was in funded with $\beta$ = 2.0 ? I suspect that this highlights a flaw/poor choice of classifier. What would be a better classifier ? Something like word embeddings where the embeddings can be ranked?

Moving on to the model summaries provided:

It’s quite evident that the phrases provide quite a big boost in terms of capturing the distribution of the dataset, so this makes me wonder, how a phrases-only model would perform? My guess is that it’s performance should be closer to the phrases + controls model than to the controls-only model. Though I’m going off a tangent but let’s say we don’t use logistic regression and opt for something more a bit more advanced e.g. sequence models or LSTMs to predict the outcome, would the model turn out to be better than the phrases+controls model? Also, will this model stand the test of time? i.e. as language or trends of marketing evolve, will this model hold true, say, 6-10 years from now? Since the paper is from 2014 and the data from 2012-2014, does the model hold true right now?

Another thing that the authors mentioned and that caught my attention is the use of social media platforms, and it’s raised quite a few questions:

  • How does linking to Facebook affect funding? Does it build more trust among backers because it provides a vague sense of legitimacy?
  • Does choice of social media platform matter i.e. Facebook vs Instagram?
  • Does language of the posts have similar semantics or is it more click bait-ish?
  • What affect does frequency of post have?
  • Does the messaging service of Facebook pages help convince vary people to contribute?

This might make for a good term project.

I would also like to raise two technical questions, regarding the techniques used in the paper:

  • Why penalized logistic regression? Why not more modern, deep learning techniques or even other statistical models e.g. multi-kernel based Naïve Bayes or SVMs?
  • What is penalized in penalized logistic regression; does it refer to the regularize added to the RSS or likelihood?
  • I understand Lasso results in automatic feature selection, but comparison with other shrinkage/regularization techniques is missing. Hence, the choice of the regularization method seems more forced than justified.

Finally, and certainly most importantly, I’m glad that this paper recognizes that:

“Cats are the Overlords of the vast Internets, loved and praised by all and now boosters of Kick Starter Fundings”

 

 

 

Read More

Reflection #2 – [1/23] – [Ashish Baghudana]

Mitra, Tanushree, and Eric Gilbert. “The language that gets people to give: Phrases that predict success on Kickstarter.” Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing. ACM, 2014.

Summary

The language that gets people to give is a fascinating study that attempts to answer two questions – can we predict what crowdfunding campaigns get funded, and what features of the campaign determine its success. Analyzing over 45K Kickstarter campaigns, Mitra et al. build a penalized regression model with 59 control features such as project goal, duration, number of pledge levels etc. Using this as a baseline, they built another model with textual features extracted from the project description. To ensure generalizability, they only chose words and phrases that appear in all 13 categories of Kickstarter campaigns. The control-only model has an error rate of roughly 17%. The use of language features (~20K phrases) reduces the error rate to 2.4%, indicating a non-random increase in accuracy. The paper then relates the top features for both the funded and not-funded cases to social psychology and the theories of persuasion. Of these, campaigns that display reciprocity (the tendency to return a favor), scarcity (limited availability of the product), social proof (others like the product too), authority (an expert designing or praising the product) or sentiment (how positive is the description) tend to be funded more.

Reflection

An exciting aspect of this paper is the marriage of social psychology, statistical modeling, and natural language processing. The authors address a challenging question about what features, language or otherwise, encourage users to invest in a campaign.  The paper borrows heavily from theories of persuasion to describe the effects of certain linguistic features. While project features like the number of pledge levels are positively correlated with increased chances of funding, I was surprised to see phrases such as “used in a” or “project will be” influencing successful funding. I am equally interested in how these phrases relate to specific aspects of persuasion – in this case, reciprocity and liking/authority. The same phrases can be used in different contexts to imply different meanings. I am curious to know if the subjectivity index [1] of project descriptions make any contribution to a fund or no-fund decision.

I would expect that another important aspect of successful campaigns would be the usefulness of a product to the average user. While this is hard to measure objectively, I was surprised to find no reference to this in any of the top predictors. Substantial research in sales and marketing seems to indicate a growing emphasis on product design for successful marketing campaigns [2].

A final aspect that I find intriguing is the deliberate choice of treating all products equally on Kickstarter. How valid is this assumption when one considers funding a documentary vs. earphones? It is likely that one might focus much more on contents and vivid descriptions, while the other might focus more on technical features and benchmarks?

The paper throws open the entire field of social psychology and offers a great starting point for me to read and understand the interplay of psychology and linguistics.

Questions

  • Do different categories of campaigns experience different funding patterns?
    • Are certain types of projects more likely to be funded as compared to others?
  • While social psychology is an important aspect of successful campaigns, perhaps it would make sense only in conjecture with what the product really is?

[1] Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (2005). Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. Proc. of HLT-EMNLP-2005.

[2] Why the Product is the Most Important Part of the Marketing Mix. http://bxtvisuals.com/product-important-part-marketing-mix/

Read More

Reflection #2 – [01/23] – [John Wenskovitch]

This paper examined a dataset of more than 45,000 Kickstarter projects to determine what properties make a successful Kickstarter campaign (in this case, defining success as driving sufficient crowd financial interest to meet the project’s funding goal).  More specifically, the authors used both quantitative control variables (e.g., project duration, video present, number of updates) as predictors, as well as scraping the language in each project’s home page for common phrases.  By combining both components, the authors created a penalized logistic regression model that could predict whether or not a project would be successfully funded with only a 2.24% error rate.  The authors extended their discussion of the phrases to common persuasive techniques from literature such as reciprocity and scarcity to better explain the persuasive power of some phrases.

I thought that one of the most useful part of this paper relative to the upcoming course project was the collection of descriptions and uses of the tools used by the authors.  Should my group’s course project attempt something similar, it is nice to know about the existence of tools such as Beautiful Soup, cv.glmnet, LIWC, and Many Eyes for data collection, preprocessing, analysis, and presentation.  Other techniques such as Bonferroni correction and data repositories like the Google 1T corpus could also come in handy, and it is nice to know that they exist.  Has anyone else in the class ever used any of these tools?  Are the straightforward and user-friendly, or a nightmare to work with?

The authors aimed to find phrases that were common across all Kickstarter projects, and so they eliminated phrases that did not appear in all 13 project categories.  As a result, phrases such as “game credits” and “our menu” were removed from the Games and Food categories respectively.  I can certainly understand this approach for an initial study into Kickstarter funding phraseology, but I would be curious to see if any of these specific phrases (or lack of them) were strong predictors of funding within each category.  I would speculate that a lack of phrases related to menus would be harmful to a funding goal in the Food category.  There might even be some common predictors that are shared across a subset of the 13 project categories; it would be interesting to see if phrases in the Film & Video and Photography categories were shared, or Music and Dance for another example.  How do you think some of the results from this study might have changed if the filtering steps were more or less restrictive?

Even after taking machine learning and data analytics classes, I still treat the outputs of many machine learning models as computational magic.  As I glanced through Tables 3 and 4, a number of phrases surprised me in each group.  For example, the phrase “trash” was an indicator that a project was more likely to be funded, while “hand made by” was an indicator that a project would not be funded.  I would have expected each of these phrases to fall into the other group.  Further, I noted that very similar phrases also existed across categories:  “funding will help” indicated funding, whereas “campaign will help” indicated non-funding.  Did anyone else notice unexpected phrases that intuitively felt like they were placed in the wrong group?  Does the principle of keeping data in context that we discussed last week come into play here?  Similarly, I thought that the Authority persuasive argument went counter to my own feelings.  I would tend to view phrases like “project will be” as cocky and therefore would have a negative reaction to them, rather than treating them as expert opinions.  Of course, that’s just my own view, and I’d have to read the referenced works to better understand the argument in the other direction.

I suspect that this paper didn’t get as much attention as Google Flu Trends (no offense, professor), but I’m curious to know if the phrasing in Kickstarter projects changed after this work was published.  Perhaps this could be an interesting follow-up study; have Kickstarter creators become more likely to use phrases that indicated funding and less likely to use phrases that indicated non-funding after the paper and datasets were released?  Another interesting follow-up study was hinted at in the Future Work section.  Since Kickstarter projects can be tied to Facebook, and because “Facebook Connected” was a positive predictor of a project being funded, a researcher could explore the methods by which these Kickstarter projects are disseminated via social media.  Are projects more likely to be funded based on number of posts?  Quality of posts (or phrasing in posts)?  The number of Facebook profiles that see a post related to the project?  That interact with a post related to the project?

Read More

Reflection #2 – [1/23] – [Pratik Anand]

The paper poses an interesting research question that how much success of a Kickstarter campaign depends on the campaign’s presentations, pitch and other factors which have no relation to the product itself. It is interesting because unlike other kind of media like advertisements, direct impact of such influences can be measured in terms of donation to the Kickstarter projects.

Tanushree Mitra et al. list out a number of factors which influence the viewers, positively or negatively. These factors, or control variables, are : project goal, its duration, video or animation used for the pitch, category of the product, Facebook connectivity etc. Impact of a video or an animation is well understood as they provide the information in a short amount of time and keep the viewers engaged compared to a large block of text. Project duration also plays a key factor. I can understand why longer project duration is seen negatively and such projects are less likely to reach their funding goal. The viewers have little interest or trust over paying for a product whose result they may see after a long duration. Products which take longer to develop are tell-tale signs of complexity and can lead to disastrous failures. Such trust deficit can only be offset by strong brands which usually Kickstarters don’t have.

Tanushree Mitra et al. built a logistic regression model for prediciting the success of kickstarter campaigns with these control variables. It resulted in 17.03 % error rate in 10-cross validation.
The authors factors in the phrases of language used in the Kickstarter campaign and the error rate reduces to 2.24 % which shows a strong correlation between language of the pitch and the success of the product. They try to explain the phrases as a trigger for one of these phenomena – Reciprocity, Scarcity, Social Proof, Social Identity, Liking and Authority.
Many of these phenomenon like scarcity, social proof and identity as well as authority are well studied psychological phenomenon, especially in the retail and entertainment industries which employ all kind of techniques – from loyalty bonuses, exclusive cards to ad campaigns which instill a FOMO (Fear of Missing Out) among the users [1]. Every other advertisement has an “expert” who claims that the given product/service is the best. Tanushree Mitra et al. reference these as part of Theory of Persuasion. Since, these are older tricks in the classroom marketing and advertisement books, it is debatable that how much effective they are in the kickstarter campaigns. Correlation does not imply causation.
Reciprocity, on the other hand, stands out as an effective technique. Kickstarter campaign, by their nature, do not give anything in return to the backers except for the promise that the product will come out for the consumers. If the Kickstarter campaign gives back something tangible to the backers, it is a very visible addon for them.
The paper shows that adding phrases and control variables to their model, they achieve high degree of accuracy in predicting success of a campaign. If platforms emerge for Kickstarters to tune their pitches based on these suggestions, will their effect subside from overuse ?
This study was performed in 2014, more than 3 years ago. Kickstarter now is a very different and diverse platform with newer options, long list of high profile success and failures (Pebble has been acquired after a string of losses, Ubuntu phone was a failed campaign, Oculus is a major player in VR etc). Product discovery portals like Product Hunt are also influencing popularity of the campaigns. Do these conclusions hold up for Kickstarter of 2017 ?

Reference :
1) https://www.salesforce.com/blog/2016/10/customer-loyalty-program-examples-tips.html

Read More