News & Ideas

Where’s the Sauce? Making Sure Your Predictive Data Science Capability Stacks Up

Adam Stotz, PhD

Chief Technology Officer

John Hermann’s New York Times article on “the stack” has us buzzing here at TROVE.

In the article, he tracks the term’s ascendancy, like many tech phrases before it, from the world of IT into the wider popular vernacular.

“The stack” as a technology organizing principle and indicator of an end-to-end capability has particular resonance for us now, because it helps frame what predictive data science is capable of doing, while also bringing into focus the structural gaps the enterprise must overcome to benefit from it.

So, what is “the stack” and, particularly, what is it in the context of predictive data science?

I think of the stack as the distinct but interdependent layers of software that need to work together as part of a system to produce a desired result. And, in the world of predictive data science, the stack is designed to put enterprise clients in the driver’s seat – masking the complex code and interactions that make it work – so they can use prediction to accelerate better business outcomes.

Here’s a truth about the stack: in modern software development, we are never building systems or platforms from scratch. Instead, they are built off pre-existing components or packages. Put another way, the stack is those components and packages – along with custom layers developers may be adding – that we build on top of.

How has the enterprise bought the equivalent of millions of Big Macs (the original burger stack?) without getting any of the special sauce?

This act of appropriation not only accelerates our ability to focus on layers of the stack that drive business value, it is turning the lion of Big Data into a lamb – despite what you might be hearing to the contrary. Thanks to the Open Source community, it’s easier than ever to pull in an Open Source component to manage data volume – it is no longer something we or the enterprise has to use custom engineering to solve.

So, unless you are Facebook or Google, you can start worrying less about the “Big” in Big Data and focusing more on the “Data” – that’s where the real value is. To do so, companies need to get “data smart,” building out a data science capability, bringing in third-party data to better understand customers, and embracing discovery-driven planning, an iterative, learn-by-doing approach to predictive data science that puts wins on the board quickly.

Understanding predictive data science in terms of the stack helps expedite this shift.

You might have heard of stacks like LAMP (Linux/Apache/MySQL/PHP), MEAN (MongoDB/Express.js/AngularJS/Node.js), SMACK (Spark/Mesos/Akka/Cassandra/Kafka), and variants of the Hadoop stack. There isn’t one platform to rule the world, and there isn’t one stack either – but there are stacks that become popular for a reason. Predictive data science necessitates certain requirements that made TROVE favor a modified version of the SMACK stack over others.

Following is my personal inventory of data-science stack considerations:

  • Infrastructure – Predictive data science starts with a solid and scalable foundation on which to deploy predictive analytics in a distributed computing environment. There’s a hot topic now called “microservice architectures,” where the base layer in the stack is comprised of “containers” of functionality, keeping software vendors and enterprise IT teams from having to build monolithic platforms. When you think data-science infrastructure, think a “containerized microservice architecture.”
  • Data Storage – Modern enterprises are a “melting pot” of data culturally diverse in the 5 Vs of Big Data: volume, velocity, variety, veracity, and value. Consider a hybrid data-storage tier with different storage techniques to handle relational, time-series, and unstructured data that fall at different points of the 5V spectrum.
  • Datasynthesis™– With a hybrid data-storage tier comes a risk of data fragmentation. At TROVE, we use a metadata repository to help business services – i.e., your “analytics” – make sense of data, namely federated data held across different storage environments, through a common lens.
  • Processing – Apache Spark has been well popularized (and for good reason) to handle distributed parallelized analytics. It’s this combination of distributed analytics frameworks and the storage technologies referenced above that is really killing the whole concept of Big Data, because we’re no longer writing a bunch of proprietary, complex code to deal with data scale. Now we’re just choosing appropriate technologies. Solving the problem of data scale is helping the enterprise shift its attention to making data useful, i.e., solving business problems with it.
  • Business Value – Which brings me to the value-added layer of the predictive data science stack, the one where enterprise pay dirt is struck – the Solver™ layer. This layer is where predictive data models are built, machine learning is leveraged, and human intelligence (be it the creativity of the data scientist, the intuition of the marketing team, or the business process insight of operations) is tapped to solve business problems. Ironically, this is the “last-mile” layer missing from most predictive data science offerings, the layer where directed-purpose analytics gets done.

The million-dollar question then is once you have a technology stack that’s able to store unique data in appropriate and efficient ways, and you have frameworks to allow analytics to run at scale, what business problems is that stack actually solving? Again, that’s the part of being data smart that is missing from most of the big platform players.

So how can this be? How has the enterprise bought the equivalent of millions of Big Macs (the original burger stack?) without getting any of the special sauce?

The short answer comes down to semantics. Software platform vendors have played up “Big Data” and the need to “manage it.” And they aren’t wrong, exactly, but managing data is just a step, albeit an important one, towards doing predictive data science. Too many stacks stop short of solving business problems, and that’s like building a rocket without a way to launch it.

The more businesses we talk to, the more we are finding a universal need for Solvers. We recently worked with a utility client who made a 7-figure investment in data “infrastructure” software, including an “analytics server,” an impressive technology stack big on “managing data,” but small on business impact. They essentially had bought an expensive data science rocket that was stuck on the launch pad.

We quickly realized they had a lot of valuable data being managed in their stack, but not being put to use. They needed some sauce! So, we layered our platform into their stack and deployed a predictive Solver to flag suspect accounts that were costing them millions of dollars. The approach worked. Without the Solver layer, the dollar saving potential of data – i.e. making it useful – had laid dormant, a condition we are finding in abundance across the enterprise.

This dormancy is often a product of endless tinkering with the data management stack. It’s o.k. to try to make your stack better; it may even be o.k. to try to perfect it; but it’s not o.k. to do so at the expense of using it. Predictive data science is an active verb in the lexicon of the enterprise – it gets results, and it gets better the more it is employed. So put it to work!

Put another way, if your company is spending millions of dollars on what is essentially a data management stack, and you’re not feeding it with the right data and not applying the right models for the right business cases on top of it, you are never going to monetize your investment.

That takes Solvers™. Add them to your stack today and prepare for liftoff.


Fill out my online form.

Thank you.
We'll be in touch soon.