There Is No Such Thing As Too Much Data

Sitting down this morning to peruse my RSS reader, I came to a blog post from someone I respect in our community, decrying too much data and too much content. What is data? when people say there is too much of it, I ask "how did they measure that?" Think about your first video camera, what was the resolution? Mine was 0.1 megapixels, it was a 1987 pxl-2000. This thing recorded video on an audio cassette tape. Even better, it was black and white. Today, I have an HD video camera, 2.1 megapixles, in my phone. What's the real difference between these two pictures? Data.

When you look at a picture, you're not interested in every pixel. You view the data as a small collection of easily identifiable 'things.' You don't analyze it as data but as a single image. The added data between the PXL-2000 picture and the one from a phone camera is very meaningful. If we are to think about data as informing the 'big picture,' we don't know what were missing if we aren't collecting it. Large scale observation data reveals otherwise invisible  interconnections.

Content is just a term for a collection of data that tells a story. Like the difference between a picture's 1s and 0s and this image of a carousel overlooking the Manhattan Bridge. That's how data tells a story.

People have a visual organ designed to sense this type of data, our eyes. You do not think about it as data, you sense it as an image. In this same way, data is one of a business' 'senses,' it is how the business reacts to what is happening around it. The way vision is for human beings.So when someone says 'less data,' it is like they are saying that there is too much to see in the world, so make our vision blurry.


The meaning of data is often hidden until you have enough of it to make sense of. I believe that our journey in the collection of data has just started. It took mother nature 3 billion years  from the origin of life to the evolution of the first proto-eye 540 million years ago. So yes, we can evolve our technology much faster than mother nature can, but we are still in the very early days. 90% of world's data generated over last two years.

Content generation is 'story telling' based on data. The content is an explanation of what is happening in the data. Saying that there are too many pipes and that increased access will reduce overall quality is quite a statement. Did the increased access of the Gutenberg press reduce overall quality? Did the increased access of the internet reduce overall quality? Sure it did, but so what. A ton of great stuff was also created. Do you consume all the content? No, we find tools to curate it and filter out the crap, and we are all better of for it. So bring on more data and more content, i think the good far outweighs the bad.

In short, we can't know what's there unless we look: collect the data.