How CIOs Should Think about Machine Learning and AI

For purposes of this post, we’re going to treat AI, machine learning and deep learning as interchangeable terms, and we’ll just refer to them as “AI” for the sake of brevity. (If you’re interested in the differences between them, we compare the definitions of AI machine learning and deep learning here.)

It’s important for the companies and leaders trying to adopt AI to share a common language with their teams of software engineers and data scientists. Too often, advanced technology is shrouded in jargon and obfuscations that are obstacles to understanding and fast implementation. The goal of this post is to introduce a set of terms and ways of thinking that can serve as a common language for teams that are starting to build AI solutions.

What Does AI Do?

The most important thing to know about AI is what it can do for you. AI makes decisions about data. Just imagine the AI as a box: you send data into the box, and decisions about the data come out.

What kinds of decisions can you make? There are four types of decisions that people rely on AI for:

  • Classification
  • Clustering
  • Regression
  • Goal-oriented actions

Let’s look at a couple examples of each:

  • Classification: Classification is identifying something: categorizing or tagging or sorting it. Putting a name to a face is a classification problem. Filtering emails into a spam folder uses an algorithmic filter to classify them as spam or not_spam. Detecting objects in images can be a classification problem. So can recognizing voices. More broadly, this is an example of machine perception, of an algorithm that maps raw sensory data to symbols that have meaning for humans. In this way, AI can interpret or give meaning to data, much as the human brain does in a flash of cognition. To train a classifier, you need a labeled dataset; i.e. you need to tag data with the names that apply to it, like pairing a JPEG with the name of the person in the photo. The way people tend to do this is by sorting data into folders with the label that applies. The algorithm then learns the correlations between the data (e.g. pixels in the shape of my face) and the label (e.g. my name). So the first question to ask, after you have decided on the outcomes or labels you want to predict, is “Do we have data labeled with those outcomes? And if not, how do we get it?” (At that point, you start looking at data strategies: how to gather and process the data to train your algorithms on.)

  • Clustering: Clustering algorithms group data by similarity. They recognize pieces of data that resemble each other, and they can gather them in one place visually to enable easy data exploration. For example, we could feed 1,000 faces of men, women and children into an algorithm, and without knowing what men, women or children are, the algorithm would group women in one corner, men in another and children in a third. That’s clustering. A clustering algorithm can recognize similarities between a search query and a set of documents, to surface relevant web sites. In reverse photo search, it can look at pixel patterns and serve up similar images, without knowing how to name the objects in the images. The inverse of similarities are anomalies, and algorithms can recognize those, too. Anomaly detection has application in use cases such as fraud detection and predictive maintenance, since fraud and equipment breakdowns usually exhibit symptoms of unusual activity, whether it is strange financial transactions, or weird data generated by the sensors attached to heavy equipment.

  • Regression: Regression just means predicting continuous numerical values. (Classification, in contrast, predicts discrete values. The animal in the photo might be a dog or an elephant or a peacock, but it is not a mix of those categories. With discrete values, there are no gray areas or gradual shifts in values.) For example, houses might be sold for any amount from $100,000 to $3,000,000. You want to predict the price of the house based on its square footage. The larger the house is, the more it is likely to cost, but the relationship between square feet and dollars is not one to one. This is a regression problem that might be expressed as house_price = slope * square_feet + intercept. Algorithms can learn the correlations between numerically continuous values as well.

  • Goal-oriented actions: Goal-oriented learning is best described in terms of a video game. In a video game, certain actions win you more points, while others will cost you. That is, there’s a correlation between what you do and the amount of points you gain or lose. Algorithms can learn the relationship between your actions and the rewards a game returns to you. Some of these algorithms are called “reinforcement learning”. Google created an algorithm called AlphaGo that learns those correlations for the board game of Go, and it learned them so well that the algorithm beat the world champion of Go. If you can define problems to be solved as games to be won, and strictly define how points are awarded, then you can apply goal-oriented learning to real-life situations as well. For example, you might reward a robot for successfully pushing a button, or you might reward a drone for every second it remains aloft. In finance, you might teach an algorithm to correlate the buying and selling of securities with the profit or loss it incurs.

If you can’t formulate the problem you want to solve as one of these four types of predictions, then you need to start over. Until you achieve strategic clarity over the outcomes you want to predict, it’s not much use fiddling with AI. You’ll probably come up with a list of a few things you’d like to predict. Some of those predictions may be pie in the sky: e.g. build me a chatbot like Scarlett Johansson in “Her”. We always recommend to start your AI journey with a relatively simple task, the low-hanging fruit. You’ll learn a lot on the way to that first win, which will make it easier to rally support for and implement your next AI solution.

What Does AI Need to Work?

To build an AI solution, you need four things:

  • Data
  • Team
  • Tooling
  • Infrastructure

We’ll explore each in turn. There are technological solutions, and vendors to sell them, for AI tooling and infrastructure. Gathering the right data and hiring the right team are harder, but no impossible.


Once you’ve defined the problem you want to solve, the next question to ask is: “Do I have the data that’s pertinent to this problem?” That data is the fuel your algorithms need. Without it, they cannot be trained, and will not be able to make accurate predictions.

AI begins with gathering the right data for the problem that’s important to you. If you don’t have that data, you need to take a step back and devise a data strategy. That strategy might detail the steps necessary to identify the proper data, gather it, move it to a data lake and store it there. To know which data you want to gather, you might need a domain expert who has a theory about what information could be used to predict the outcomes you care about. That is, they already think they know that A causes B, and may even have ways to measure A and B.

For example, let’s say you want to predict when inventory for something like beer on tap goes out of stock. You might have a theory that by knowing when the last beer delivery was, the volume of the kegs, and measuring the real-time flow related to the beers on tap as well as their historical seasonal consumption, you might have a good way of knowing when your Miller Lite will run out, so you can schedule the next delivery. That’s the theory of beer dealing with cause and effect. Each different predictive problem will require someone with knowledge to point your data gathering in the right direction, according to a theory that fits the domain.

In the previous section, we listed the types of problems that AI can solve. Below, we’ll list many of the major data types that AI can work with:

  • Text
  • Time Series
  • Tabular data
  • Images
  • Video
  • Sound
  • Voice

Many of these data types require no explanation. Everyone knows what an image is. Everyone knows what sound is.

What’s a Time Series?

Others are slightly less common: not every one is familiar with the term “time series”. A time series is data that is recorded over time, where the sequence of the data is important. For example, temperature readings thrown off by a thermometer come in a series over time. You could take a reading each second, or each minute, and that series of temperatures would contain patterns.

A thermometer measuring the outdoor temperature would peak in the early afternoon and reach its low point sometime after midnight in a daily cycle. That’s a regular pattern, and on the basis of that pattern, an algorithm can both make predictions about future temperatures, as well as recognize anomalous behavior when the temperature doesn’t behave as expected.

Attaching a thermometer as a sensor to a piece of heavy equipment can reveal patterns about the machine’s internal state: extremely high temperatures at the wrong time might signal that something is going wrong. That’s the basis of a preventative maintenance that uses AI.

Other forms of time series data include server logs that monitor the behavior of machines in your data center; web logs that track the behavior of customers on your website; data packets flowing in and out of your servers (that has clear cybersecurity applications); monthly consumption figures used for market forecasting and inventory planning; and many other sensors the monitor the health of patients (think vital statistics measured in a hospital) and machines.

Tabular Data

Tabular data, also referred to as columnar data, is the kind of data you find in a relational data base. If you’ve ever worked with a spread sheet, where each row is a different thing and each column records a different aspect of the thing, then you know what columnar data is. For example, each row might be a person with a unique ID, and each column would record a piece of metadata about that person, like height, weight, gender, address, etc.

While recent advances in AI have highlighted its superhuman accuracy in image recognition and voice detection, most business problems involve either text, time series or tabular data. And deep learning applies equally well to those. Which is to say, deep learning opens the possibility that on old business problems, like fraud detection or market forecasting, companies can cut their error rate in half. Those leaps in accuracy have a monetary value, which only the business itself can know. And those leaps in accuracy are the reason why enterprise is adopting AI technology.

What’s the Hell’s a Tensor?

Algorithms such as deep artificial neural networks can work with any type of data, as long as it’s a tensor. ;) That is, to feed data into a neural net, you need to transform it with a data pipeline focused on machine learning. That data pipeline takes in data in any form, and it outputs a tensor. That’s why Google calls their AI tool Tensorflow.

What’s a tensor? It helps to start with something simpler and build up. A single number like 5 is called a scalar. When you stack those scalars one on top of another, like cells in column in a spread sheet, that’s a vector. When you take several columns and place them next to each other, as we do with spread sheets, you have an array. And if you place those two-dimensional arrays on top of each other like pages in a book, that’s a tensor. A tensor is an array with three or more dimensions. You can actually keep adding dimensions beyond three, but it’s more difficult to visualize. In any case, tensors are the basic data structure that deep learning algorithms work with.

Just to be clear, when you starting building AI solutions, your team will spend 80% of its time on data pipelines, the workflows that pre-process and clean and standardize and normalize the data so that algorithms can learn something from it.


Which brings us to the team you need. Broadly speaking, there are six groups of people you should think about when you’re building AI solutions:

  • Data Scientists
  • Data Engineers
  • DevOps/Sysadmins
  • Software Engineers
  • Domain Experts
  • Users

Data Scientists

There are only two problems with data science. The data, and the scientists. The problem with the data is it’s messy. The problem with the scientists is they’re scarce. There are not enough data scientists in the world, which is a real constraint on the adoption and implementation of AI.

What does a data scientist do?

A wag would say that a data scientist is someone who knows statistics better than a programmer, and programming better than a statistician. Alternatively, “data scientist” is how we say “statistician” in Silicon Valley. Both jokes have some truth in them.

Data science is inherently statistical, and statistics is about counting how often things happen, in order to make predictions about them. But data scientists also have to master the computational tools to manipulate the vast amounts of data they are called upon to work with. It often falls to the data scientist to locate the proper data, explore it, and conduct a series of experiments to see if the data will be useful for making certain kinds of predictions. To conduct those experiments, the data scientist will choose a variety of algorithms, and then tune and train those algorithms on the data to gauge their performance and accuracy. That’s the core of a data scientist’s activity.

Data scientists typically work with machine-learning and statistical tools, and most of them work in programming languages such as Python and R.

Data Engineers

Data engineers are more involved with gathering, moving and storing the data. If there are data pipelines to be built in order to transform the data somehow, especially on a high-throughput production stack, then data engineers will likely be the people to build it, rather than data scientists. Often the data scientists prototype a series of data transforms that later has to be rewritten in a more performant manner.

Data engineers typically work with the “big data” stack, using languages such as Java, Scala, C# and C/C++.


If the role of the data scientist is to train an algorithmic model to make accurate predictions about data, the role of the DevOps team is to take that AI model and deploy it to the production stack, integrating it with the technologies your company already uses as part of your product. The chief task of DevOps is to maintain the stability and reliability of software products and IT stacks, and that includes any AI models to be incorporated into those products. This is the realm of machine-learning ops, or the operational considerations of deploying and maintaining a machine learning model. Without this stage, AI is just an experiment whose results might be discussed in meetings. It is only in deploying AI models and integrating them into your products that your company will get real value from AI.

Software Engineers

AI is just a component in a larger product. Aside from DevOps, there are questions to be asked about how AI fits into the rest of your software development process. On the most successful teams, data scientists frequently embed with larger software engineering teams in order to answer those questions more quickly.

Domain Experts

Sometimes data scientists happen to be domain experts with special knowledge about the problem they need to solve, and sometimes they do not have that expertise.

Domain experts, or people with expert knowledge of the area in which you want to make predictions, are rare, and their expertise can both steer the questions you ask and influence the speed at which your team can explore the problem. Making predictions about an automobile composed of 10,000 parts is a process that can benefit from some knowledge about those parts and how they are assembled.

Any company seeking to answer highly complex questgions will need a few highly paid domain experts to advise on their predictive problems, or risk frequent and long delays in finding the answers.

Machine-Learning Tooling

The landscape of machine-learning tools is highly fragmented. Every public cloud company has produced its own machine-learning toolkit in an attempt to commoditize the complements of its true money-maker, renting chips by the hour: Google created TensorFlow, the most popular machine-learning framework written for the Python programming language, as well as Keras; AWS has adopted Apache MxNet, created by CMU; and Microsoft built CNTK.

Meanwhile, Berkeley produced Caffe, the popular machine-vision framework that is slowly dying for lack of support, and Caffe’s creator moved to Facebook, which has introduced PyTorch and Caffe 2. As of 2018, PyTorch is the darling of machine-learning researchers due to its flexibility, while TensorFlow is dominant among data scientists.

The question for executives, product managers and leaders of business units is which one of these open-source frameworks do you include in a mission-critical application, and what do you do when it breaks? At other layers of the open-source stack, there are clear answers to that question. Red Hat supports Linux via RHEL; Cloudera and Hortonworks support Hadoop and Spark via CDH and HDP, respectively.

The public cloud vendors, whose strategy can be summed up as “Accenture for hardware”, are introducing machine-learning toolkits much like Microsoft created Internet Explorer, as a way to shore up their core business. But they rarely if ever sign SLAs for on-prem deployments, precisely because that does not serve their cloud strategy.

AI Infrastructure

Tooling is just the top layer of an array of technologies that you might call AI infrastructure. Why is AI infrastructure different than any other type of IT infrastructure that requires data storage and networking and processing? Well, the most advanced kinds of AI, such as deep artificial neural networks, are data-hungry. The algorithms need to train on a lot of data in order to make accurate predictions. We’re talking gigabytes, terabytes and sometimes petabytes of data. Processing that data is computationally intensive.

That’s why AI creates a giant sucking sound. It’s sucking in data. That data needs to be gathered, moved, stored, transformed and processed in order to produce predictions with AI. Doing that efficiently (so that data scientists can perform experiments in a reasonable amount of time) requires special hardware, fast storage and real-time streaming. This is the dynamic that Geoffrey Moore writes about in his book “Inside the Tornado.” Enterprise needs to run a new kind of workload to produce highly accurate, cheap predictions, and that changes everything about IT.

The hardware required is usually a GPU, and there are only two viable GPU makers: Nvidia and AMD (which has the backing of Intel). You can decide not to buy the more expensive chips, of course, but then you’re paying your scarce and expensive data scientists to twiddle their thumbs while the algorithms train for hours and days on slower chips.

More Machine Learning Tutorials

Chat with us on Gitter