Why is Everyone so Fired Up About Big Data?

Advanced analytical capabilities?  Enhanced business intelligence?  Superior decision support abilities? The ability to turn data into money?


While all those are good things and very worthwhile pursuits, they are not the reasons our industry is so enamored with the construct of big data.  The reason we love big data is far less complex.

There are really two reasons:  first, because Oracle decided to get into the game and second because once we figured out why they wanted into the game, we realized there was a ton of money to be made or lost.

A quick history lesson:

Big data means nothing. It’s a well-meaning term for (literally) big piles of data, sitting in various massive balls of infrastructure, randomly scattered around our enterprise.  More common terms include data warehouses or decision support systems, etc.

The IT industry has been built on the back of transactional systems: the big iron, big money, big visibility systems that run our companies.  Those are the most expensive, most important systems in our worlds and as such have the best people on them, the most expensive components both software and hardware and have the most risk (perceived or real) to our livelihoods, which is why they are the most reliable.

Transactions occur once.  We make sure our systems scale to meet transactional demands once.  We pay a lot to over-provision in every aspect because the transactional scale is not a nice to have, it’s mandatory.

Big data is created by copying transactional data and sticking it on another system. We copy ALL our transactional data and stick it on these systems. Over time, those systems become supersets of our transactional systems. We make lots of copies and put them in lots of big data systems.  Talk about the need for dedupe (which, in big data parlance, is MDM: master data management, but I digress).

Since we used big iron/big databases/big money stuff on our transactional systems, we had the tendency to duplicate those investments on our big data systems even though those systems behave completely differently from each other.  So, lo and behold, we found ourselves with giant systems and giant expenses all over the enterprise, built for transaction processing, that doesn’t ever do transaction processing.  They sit idle 90% of the time, waiting for an analyst to come up with some query to run against the data set.

I figure for every transactional Symmetrix EMC sells, they sell two more for big data apps.  Same with IBM, etc. Oracle sells TONS of RDB licenses for big data infrastructure, even though it seems stupid that that type of data sits inside an RDB.  People are creatures of habit.  Industry counts on it.

So, along comes the benevolent Mr. Ellison. He sees that the masses are paying $3M+ in infrastructure costs for each of their big data piles. He doesn’t think he is getting enough of that pie. So he brilliantly decides that he can radically reduce the spend on hardware by packaging up commodity stuff purpose-built to handle the big data issues. He then bundles all of his magic software which now consumes 80% of the spend and puts a bow on it for customers.  Then he gets downright evil (and even more brilliant): he runs into all the big sites, performs a quick software audit, and finds out those sites are all out of licensing compliance.  Suddenly, customer A gets handed a bill for $4Mbut not to fret, because the good people at Oracle have a simple solution: instead of just paying up, why not just buy a new Exadata system which will greatly simplify your life and well make that little compliance issue disappear?  Genius, really.

I figure there is at least $6B in big iron at risk in this area alone.

Thus, the reason this space is electric right now has nothing to do with the marketing you hear.  It’s not about the promise of driving value out of your data, it’s about vendors figuring out that they either attack that space or get slaughtered watching. EMC didn’t buy Greenplum because they are nice people. They bought it because it helps them go on the offensive in the big data space and most importantly it does so by attacking Oracle where its (black) heart is: at the RDB.  You don’t need a $2M Oracle DB license if you use Greenplum, or Vertica, or Aster, or Informatica, etc.  IBM knows it.  HP knows it (which is why they bought Vertica yesterday).  It’s just a matter of time before Dell makes a play.

Now, not to be overly pessimistic, there is a silver lining.  Once all the plays have been made for their ulterior motives, then everyone will get down to the real value at hand, which is making those random piles of data start generating customer value.  That will happen.

First things first.