A few months ago, I was talking with the CTO of a major bank about machine learning. At one point he shook his head ruefully and said, “Dinesh, it only took me 3 weeks to develop a model. It’s been 11 months, and we still haven’t deployed it.”
This is just one example of the hazards you meet when machine learning encounters the real world. One thing is becoming clear: Machine learning data and models aren’t static. They never will be.
We need to embrace the fact that machine learning will only work over the long term if it’s fluid. In this case, being fluid means building your machine learning system on five important pillars as shown in figure #1:
For machine learning to do real and lasting work for an organization, you need thoughtful, durable, transparent infrastructure. That starts with identifying the data pipelines and correcting any issues around poor or missing data that can hamstring the accuracy of the models. It also means integrated governance and version control for models. Be sure that the version of each model – and there may be thousands of models being used concurrently— clearly indicates its inputs; regulators will want to know.
Being fluid means accepting from the outset that your models will fall out of synch. That “drift” can happen quickly or slowly depending on what’s changing in the real world. You need a way to do the data science equivalent of regression testing — and you need to do that testing frequently without burning up your time.
That means configuring a system that lets you set accuracy thresholds and automatic alerts to let you know your models need attention. Will you need to retrain the model on old data, acquire new data, or re-engineer your features from scratch? The answer depends on the data and the model, but the first step is knowing there’s a problem.
WATCH: I introduce the concept of Fluid ML in my keynote at O’Reilly Strata Data Conference in February.
Most machine learning is computationally intense — both during training and particularly when models have been deployed. Most enterprises need models to be able to score transactions in milliseconds not minutes – to identify and prevent fraud or leverage a fleeting opportunity. You need excellent performance in both realms. Ideally, you can train models on GPUs and then deploy them on high-performance CPUs and enough memory to do real-time scoring.
And of course you want everything to run fast and error-free regardless of where you deploy: on-prem, cloud, or multi-cloud. Here, Fluid ML equals flexibility for the run time environment, without compromise.
These days, organizations across sectors are budgeting generously for machine learning projects, but those budgets will dry up if data science teams can’t deliver concrete results. You need to be able to quantify and visualize changes over time: improvements in data access and data volume, improvements in model accuracy, and ultimately improvements to the bottom line.
Begin with the end in mind. Think not only about what you need to measure now, but also about what you’ll want to measure in the future as your data science work matures. Is the system fluid enough to track those long term goals?
I started by pointing out that machine learning data and models aren’t static and never will be. The fifth and final pillar of Fluid ML is about continuous learning as the world changes. Ensure that your system lets you use tools like Jupyter and Zeppelin notebooks that can plug into processes for scheduling evaluations and retrain models.
At the same time, expect your own learning to grow and evolve as you absorb the advantages and limitations of various algorithms, languages, data sets, and tools. Fluid machine learning requires not only continuous improvement from the data and the system, but also continuous improvement from you and your teams.
The first three pillars are about “always-on” and the second two are about continuous learning. Wherever you are in your data science journey, the pillars of Fluid ML can bring focus to each moment and clarity for the future. It’s a bright future, and thinking carefully about machine learning can get us there. Try it today at datascience.ibm.com.
Dinesh Nirmal – Vice President, IBM Analytics Development
Follow me on Twitter: @dineshknirmal