Pryzant, Reid, Young-joo Chung and Dan Jurafsky. “Predicting Sales from the Language of Product Descriptions.” (2017).
 Hu, N., Liu, L. & Zhang, J.J. “Do online reviews affect product sales? The role of reviewer characteristics and temporal effects”. Inf Technol Manage (2008) 9: 201. https://doi.org/10.1007/s10799-008-0041-2
As the title suggest, the authors’ main goal is to predict sales by examining which linguistic features are more effective. The hypothesis is that product descriptions have a significant impact on consumer behavior. To test this hypothesis, they mine 93,591 product descriptions and sales records from a Japanese e-commerce website. Subsequently, they build models that can explain how the textual content of product descriptions impacts sales. In the next step they use these models to conduct an explanatory analysis, identifying what linguistic aspects of product descriptions are the most important determinants of success. An important aspect of this paper is that they are trying to identify the linguistic features by controlling for the effects of pricing strategies, brand loyalty and product identity. To this end, they propose a new feature selection algorithm, the RNN+GF. Their results suggest that lexicons produced by the neural model are both less correlated with confounding factors and the most powerful predictors of sales.
This has been a very interesting paper. Trying to control for confounding factors in an RNN setting is something that I haven’t seen before. The subsequent analysis by utilizing a mixed model was also a very good choice. Technically speaking, this paper is not that easy to follow because it requires a wide range of knowledge from different disciplines: deep learning, excellent knowledge of statistical modeling (i.e. mixed models and all the tests and assumptions) and knowledge from consumer theory.
It should be stressed that when textual features are regarded as fixed effects, this implies that they are invariant for the log of sales, which is the dependent variable. This makes sense for the authors because they believe that by adding the brand and the product as random effects they control for every other effect. Perhaps they could have controlled for seasonal effects, albeit since the data is a snapshot for only a month, it is understandable why the authors didn’t do it.
I am not certain how the authors ensure that the random effects, brand and product, are not correlated with the rest of variables.
- Comparison of the results to a baseline RNN model? I wonder how would that work.
- Chocolate and Health products have different transaction and information costs compared to the clothing industry. Would the linguistic results for E-commerce hold for other industries?
In this study, we investigate how consumers utilize online reviews to reduce the uncertainties associated with online purchases. The authors adopt a portfolio approach to our investigation of whether customers of Amazon.com understand the difference between favorable news and unfavorable news and respond accordingly. The portfolio comprises products and events (favorable and unfavorable) that share similar characteristics. The authors find that changes in online reviews are associated with changes in sales. They also find that, besides the quantitative measurement of online reviews consumers pay attention to other qualitative aspects of online reviews such as reviewer quality and reviewer exposure. In addition, they find that the review signal moderates the impact of reviewer exposure and product coverage on product sales
Another interesting paper. The authors use the concept of “transaction costs”. In addition to the transaction costs that are mentioned in the theory, there is another one that is known as “search cost”, although they touch the subject indirectly when they introduce the uncertainty theory. The amount of time consumers will dedicate to search more information about a product is bounded, and it is related to their “opportunity cost”. In general, this cost is lower in E-commerce than in other markets (for instance, going from one grocery store to another induces large search costs). In general, an E-commerce market is considered to be closer to the “ideal” perfect market, than the traditional markets, because all these transaction costs that distort the market and are mentioned in the paper are lower. Overall, as far as I am aware, Amazon.com tags user reviews with a “verified purchase”. I wonder if the authors took this into account. In addition, I have the same criticism as in the previous paper. In this case, the authors used book, DVDs and videos. These are entertainment products with very similar transaction costs attached to them. It is unclear whether these results can hold for other products.
Extension: See if the results hold for products that belong into a different category other than entertainment.