Pryzant, Reid, Young-joo Chung and Dan Jurafsky. “Predicting Sales from the Language of Product Descriptions.” (2017).
Summary
Pryzant’s paper focuses on predicting sales of products from product descriptions. The problem is not straightforward because sales are often impacted by brand loyalty and pricing strategies, and not merely a function of the language of product descriptions. The authors solve this problem by using a neural network with an adversarial objective – a good predictor of sales, but a bad predictor of brand and price. They use the adversarial neural network to mine textual features to feed into their mixed effects model where the textual features form the fixed effects, and the product and the brand form random effects.
The authors use the Japanese e-commerce website Rakuten as their data source and choose two categories – chocolate and health. The motivation to choose these two is that
- high variability in the chocolate products
- pharmaceutical goods that are often sold wholesale
- the two categories are at two ends of a spectrum
Performance metrics showed that R2 values with textual features and random effects consistently performed better than without the textual features. On further analysis, the authors find that influential keywords had the following properties – informativeness, authority, seasonality relevance, and politeness.
Critique
I enjoyed reading the paper for multiple reasons. Firstly, the focus of the paper, despite being neural network-based, was not on the model itself, but on the final goal of finding influential keywords and why they were influential. Secondly, the use of the neural network in this model was not just to do a bunch of predictions (though I’m sure it would be good at that), it was to do feature selection. As some of the other critiques already mentioned, this is counter-intuitive but seems to work really well. Thirdly, adversarial objectives haven’t worked very well for text previously, but the authors were able to find good use of the technique in textual feature selection.
A few points that I had comments about:
- The authors chose chocolates and health products as the two categories. Health products (in my opinion) do not have any brand loyalty associated with them, especially if you have only one choice, or if the drug is generic. In such a category, why does one need to control for the brand?
- On a related note, could the authors have done this analysis on a product category like shoes or electronics?
- Thirdly, the results that arose out of the influential keyword analysis reflect the Japanese culture. The keywords are likely to differ in a country like the US (where product descriptions might play a significantly more important role) or India (where the price would play a very important role).
- Finally, images are becoming key to selling products in the online space. I have no study to prove this, but increasingly users have decreased attention spans and focus on the images and the reviews to decide if they want a product. It would be interesting to incorporate those features as well in future work.