Moderate googling did not reveal any mention of Soylent data being used to train AIs, which surprises me… The fact that I hadn’t heard of Soylent before this paper also surprised me—why isn’t it so popular that “Soylent” is a household name? (Well, it kind of is, but it’s in reference to the soy-based beverage in the minimalistic bottles.) Why aren’t there headlines about machine learning algorithms being trained on Soylent-sourced edits? Perhaps it is another case of the DVORAK keyboard: the problem that Soylent solves is not so big that people want to use it, even if the savings are guaranteed. We’d rather ask an office-mate to proof-read our memo than the anonymous crowd, even if the crowd costs less than five minutes of Jorge-down-the-hall’s time.
It’s a testament to the effectiveness of Soylent that the authors had their own paper crowd-proofed (p. 317) and shortn’d (p. 322). The paper reads like an advertisement, and this reader is sold…sold on its effectiveness, that is. Not so sold that I can see myself ever trying it out. I would rather ask Jorge, and anyway, Microsoft Office is so passé.
The Find-Fix-Verify method impressed me. On average, 30% of one Mechanical Turker’s raw edits on an open-ended task are poor—too many Turkers and Eager Beavers or Lazy Turkers—but when an editing task is divided into parts (e.g. on Crowdproof, one set of Turkers finds the passages with problems; another fixes the problems they find in said passages; and a third verifies those fixes in a voting system), you see the end result become far less noisy. The job will take longer, but the gains in accuracy are well worth it.
c’mon! i was planning to do that! XD