Big Data, Big Myths


WPF Blog Post


Forbes has published a thoughtful article about Big Data, reeling the hype attached to the catchy term back to reality. The article, written by Forbes contributors Woodrow Hartzog and Evan Selinger, outlines why the term Big Data isn’t used by people who actually work in Big Data. The article meanders through a couple of expected signposts, but then it takes an intriguing and fresh turn when the authors discuss the ways Big Data as a term and Privacy as a term are used.

The authors state:

“Ultimately, “big data” and “privacy” should be understood as heuristic terms. Heuristics are mental shortcuts that help individuals quickly make sense of complex and often conflicting pieces of information. Heuristic terms are useful for things like starting conversations, framing issues, and quickly grasping general trends that run throughout complex debates.

Once the conversation develops sufficiently, it generally becomes time to switch to a different way of speaking and adopt specific terms.

Specific terms involve greater precision than heuristic terms. They need to be used when implementing solutions to the problems that heuristic terms approximate. Examples of more specific privacy-related terms are “confidentiality”, “obscurity”, “transparency”, and “due process”. Big data-related specific terms might refer to acting on inferences gleaned from data sets, the dramatic increase in data inputs and the specific design of algorithms to uncover surprising correlations.”

This construction resonates with me. Privacy is best understood as a complex bundle of interrelated rights, laws, concepts, and norms, and is best discussed within a broad framework of Fair Information Principles. Likewise, Big Data is best understood as a complex bundle of interrelated issues regarding inferring value from the data through analysis and understanding and visualization of large data sets, and broadly includes the issues of data size, data velocity, data variety, data source, accuracy, data analytics and predictive analysis, and scoring, among many others.

The terms “privacy” and “big data” have suffered from rhetorical extremism and overly flippant use in some quarters. The two terms have also suffered from being broadly linked without proper factual basis or specificity. Big Data does not always invoke privacy issues, and this is important to acknowledge so that when large datasets do carry privacy risks, that those real risks get attention and mitigation. This article takes a step toward finding a more informed and moderate middle ground where a real discussion of both concepts — and how they do and do not interrelate — can take place.

–Pam Dixon