More Data Is Not Always Better Data

“As organizations everywhere increasingly embrace analytics, it is tempting to think that additional data will provide the crucial insight, reveal the overlooked explanation, or crisply discern key solutions within a morass of muddled information. But “more data” is not the answer to every problem,” according to the MIT Sloan Management Review.

“Organizations that add data indiscriminately run the risk of becoming data hoarders instead of data collectors. An analyst working in a large financial services institution offered this useful distinction: ‘Hoarders store everything and don’t know how to determine what is important. Collectors know exactly what is valuable and prioritize what to keep.'”

“As data storage costs continue to plummet, why not just save everything? Why not be a hoarder? The answer is: hoarding wastes resources and, paradoxically, reduces the usefulness of existing data.”

The article summarizes three key lessons:

  1. “More data” should not obscure desirable information or, through distraction, allow ongoing analyses to come to harm.
  2. “More data” should be added only if other data will not suffice and its addition does not conflict with the First Law
  3. “More data” should be added only if its addition does not exacerbate existing biases in the data, and its addition does not conflict with the First or Second Law.

Harvard Business Review: “Any data scientist worth their salary will tell you that you should start with a question, NOT the data. Unfortunately, data hackathons often lack clear problem definitions. Most companies think that if you can just get hackers, pizza, and data together in a room, magic will happen.”