This paper sought to find analogies in big messy real-world natural language data by applying the structure of purpose and mechanism. The authors created binarized data for each purpose and mechanism of a document by setting some words to 1 if one of purpose or mechanism could be represented by the word and 0 if not. Then the authors could evaluate distances between each of these vectors to let users generate creative ideas that have a similar purpose but the different mechanisms. The authors utilized Mturk to generate training sets as well as for evaluation. They measured creativity in terms of novelty, quality, and feasibility. The authors report significantly improved performance than the baseline of plain TF-IDF or random.
This paper appeared to be similar to the SOLVENT paper from last week, except that this one worked with product descriptions, only used the purpose-mechanism structure, and evaluated based on creativity. This was actually a more interesting read for me because it was more relevant to my project. I was especially inspired by the authors’ method of evaluating creativity. I think I may be able to do something similar for my project.
I took special note to the amount of compensation the authors paid to Mturk workers and tried to reverse-calculate the time they allotted for each worker. The authors paid $1.5 for a task that required redesigning an existing product, using 12 near-purpose far-mechanism solutions found by the authors’ approach. This must be a lot of reading (assuming 150 words per solution, that’s 1800 words of reading leaving out the instructions!) and creative thinking. Based on the amount paid, the authors expected about 10 minutes for the participants to finish their work. I am unsure if this amount was appropriate, but based on the authors’ results, it seems successful. It was difficult for me to gauge how much I should pay for my project’s tasks, but I think this study gave me a good anchor point. My biggest dilemma was balancing out the number of creative references provided by my workers versus the quality (more time needed to generate, thus more expensive) for each of the references.
These are the questions that I had while reading the paper:
1. One of the reasons why the SOLVENT paper expanded their analogy structure to purpose-background-mechanism-findings was because not all papers had a “mechanism” or a “solution.” (i.e. some papers were about simple findings of a problem or domain.) Do you think the same applies to this study?
2. Do you think the amount of compensation the authors paid was appropriate? If not, how much do you think would have been appropriate? I would personally really like to read some answers about this question to apply to my project.
3. What other ways could be used to measure “creativity”? The authors did a great job by breaking down creativity into smaller measurable components (although still being qualitative ones) like novelty, quality, and feasibility. Would there be a different method? Would there be more measurable components? Do you think the authors’ method captures the entirety of creativity?
I would like to answer your first question. This paper by Hope et al. was published a year prior to the SOLVENT paper and some of the authors in this paper were part of the research team in SOLVENT. Hence, I agree that the same applies to this study. Also, limiting to purpose and mechanism is not a good idea. I believe that in complex datasets this would be difficult. Regarding your last question, I am also concerned about the best method to find the creativity. I liked the authors’ idea about finding creativity, but the word is extremely subjective in nature.