Boyd and Crawford are looking into the benefits and downfalls that come about from the usage of Big Data. This is a worthy discussion because there are high expectations and wild assumptions being made by the public about Big Data. While Big Data can lead to some previously unattainable answers, this means we should be more wary of its results. Using other methods to reach toward these answers might be impossible, possibly leaving Big Data’s results both unverified and unverifiable, a very unscientific characteristic.
Issues that plague a discipline or research area are not a new concept. Statistics has sample bias and Simpson’s paradox. Logic has the slippery slope and the false cause fallacies. Psychology has suggestion and confirmation bias as well as hindsight, primacy, and recency effects. However, Big Data has method and misinterpretation errors that can be compounded with nearly all the previous issues listed. This leads to huge issues whenever the pure quantitative appearance Big Data is championed and accepted without closer investigation.
There might be ways of combating this if Big Data can adopt the same defenses other disciplines use. Extensive training for the individuals at the forefront to challenge and counteract fallacies can be seen in Logic through lawyers and judges. Statistics has developed data reporting standards that either avoid or reveal issues as well as explicitly reporting the uncertainty and precision of measurements. Psychology actually integrates what could be a pitfall into their actual experimental design when using the placebo effect or hiring actors to test social situations, but then show the changes to results when compared to a control group. Big Data researchers should adopt these defenses or invent new ones to give more authority behind their assertions.
Lazer and Radford support many of these same concerns, but also point out a more recent change: intelligent hostile actors. This is one of the largest threats to Big Data research since it is a counteracting force that naturally evolves to both survive and do more damage. As bots and manipulation cause more destruction and chaos, every Big Data research of that data becomes less trustworthy. Interestingly, positive outcomes can come from simply revealing the presence of hostile actors in within Big Data sources. This would call into question the validity of findings that previously may have been undoubted thanks to the tendency of quantitative results being viewed as objective and factual.
Questions:
- Should Big Data be more publically scrutinized for hostile actors’ data manipulation in order to keep expectations more realistic?
- Should Big Data research findings be automatically assigned more doubt since misleading or misunderstood results can be so well hidden behind a veil of objectivity?
- Would more skepticism towards Big Data research slow or damage the research efforts until it causes a net negative impact on society? Could we find this limit?