Some ten years ago, I started writing about what we then called “big data” for ZDNet; in fact, I was the first person at ZDNet focused on it exclusively. Coming from a consulting background in enterprise business intelligence and application development, I thought it would be fun to cover this burgeoning new area of the analytics game that I had been part of since the late 1990s. The editors wanted to name this blog simply “Big Data.” My thought was that that term wouldn’t age so well…that whatever was “big” then would seem “regular” in ten years’ time. I suggested a slight variation: “Big on Data” (because I was, and I am). And that’s how the blog and its name came about.
Also read: Big Data: Defining its definition
I was a bit amused that so many people saw big data as shiny and new. It wasn’t…instead, it was a logical progression of the enterprise BI technology that had existed for the period of about 20 years prior. There were some important differences, though. Instead of being based on expensive commercial software, the tech of the day — Apache Hadoop — was open source. Instead of leveraging proprietary data warehouse hardware appliances, using (limited) enterprise storage, Hadoop used commodity servers, and their inexpensive direct-attached storage (DAS) commodity disk drives. And rather than struggling at terabyte scale, Hadoop bragged it could work at petabyte scale — handling data volumes three orders of magnitude bigger.
Lots of warts
There were downsides, too. And lots of them. Hadoop didn’t work with SQL, but rather required engineers working with it to write imperative MapReduce code — in Java — to get their work done. It worked in batch mode, and not interactively, so it was…slow. And beyond analytical queries, every workload required its own engine. Data transformation, streaming data processing, machine learning and job flow required other open source components, with names like Pig, Storm, Avro, Mahout, and Zookeeper, each of which featured its own arcane command line interface.
Also read: The MapReduce 101 story, in 102 stories
Everything was based on simple data files; security was file-based, too. In fact, the granularity of security was so unwieldy that many orgs using Hadoop simply gave all their users full access to everything, but limited that group of users to a small cohort. Corporate standards be damned…they only served to stymie innovation. Beyond individual technologies, there were so many vendors that it prompted me to deliver a talk in 2016, at the now defunct Hadoop Summit, called “The Ecosystem is Too Damn Big.”
Also read: The Odd Couple: Hadoop and Data Security
It struck me at the time that all this new technology, meant to democratize data and analytics, was doing just the opposite. Worse, so many of the new startups in the space were founded and led by practitioners, who, though brilliant, ignored the technological gains, sensible standards and broad appeal of the BI and data warehousing tech that preceded them and their companies’ platforms. In the name of casting off the old technologies’ hegemony, the new technology in many ways represented a regression, rather than an advance.
Today, Hadoop and, more important, its complexity have largely been rejected, and the data warehouse is back with a cloud vengeance, featuring Snowflake as its poster child. NoSQL databases now speak SQL. Open source platforms sport real security, and detailed data governance. File-based analytics technology has taken on the “data lake” moniker and Apache Spark has superseded Hadoop as the tech standard. Even in that world, Databricks, founded by Spark’s creators, has embraced a hybrid data warehouse/lake concept it calls the data “lakehouse.”
When it comes to established technology and concepts that work well, what’s old often becomes new again. Startups get new CEOs that are more business-focused, and less tech- or academics-oriented. Snowflake and Databricks are cases in point. Areas with too many vendors see a wave of consolidation, often resulting in rivals joining forces, which is exactly what happened with two pioneering Hadoop companies, Cloudera and Hortonworks, now unified under the former’s name.
Other vendors simply get swallowed up, sometimes in asset purchases, with the third Hadoop company, MapR, now part of HPE, a perfect exemplar. Hero pure play companies get acquired by larger, entrenched players, as was the case with Salesforce acquiring Tableau, and Google Cloud grabbing Looker. Enterprise old dogs learn new tricks, as with Microsoft and Power BI.
Exuberance gets rational
All of this is a sign of a new, innovative sector stabilizing, maturing and moving from cutting-edge curiosity to mainstream mission-critical technology. The tech gets easier to use, its market gets more sustainable, and it gains adoption even from conservative customers. The category gets more elbow grease, though it may lose some of its sheen.
That doesn’t mean it gets less important. Data isn’t going anywhere. As I often say in talks I give at conferences, data is simply a set of point-in-time recordings of events that have taken place, and of the actions and facets of the people, devices, organizations and processes that participated. Data can’t go away any more than business itself can, and likewise for analytics. Business runs on data. Even if data isn’t “a thing,” it’s the thing that enables and powers all the other things, including AI and data science — in name, and in substance. Just as I was amused over 10 years ago that people found big data to be shiny and new, I’m amused now that they may find it old and crusty.
I must be going
For over a decade, it’s been a thrill to write for ZDNet, the enterprise tech news site that draws heritage, and the first two letters in its name, from the company that still publishes PC Magazine, which I once eagerly read, physical cover to physical cover, when I was literally still a kid. It’s hard to imagine a more venerable site.
And yet, I’m moving on, to not just one but two other excellent outlets, where mature, but innovative technology retains its spotlight. Join me at either or both if technology that’s cool when it’s behind the scenes, and when it’s center stage, still fascinates you.