Machine Learning & Big Data –Don’t Fall for this Data Trap

2 minute read

oops no image! Data from star-trek caught in a trap…


Many organizations are too impatient they jump into machine learning without proper market research. Instead it’s best to think things through –so let’s start here: There are three broad categories to machine learning.

  1. The Data

  2. The Algorithm

  3. The Machines (Computers and their GPUs)

The algorithms and Machines can be provided by any one of the many partner companies such as Amazon and Google. But the data used to train an algorithm has to –in part come from your organization and hence why the data is the most important and is your organizations’ competitive advantage.

Data is your competitive advantage

Naturally you have to have enough data as well as the right kind of data to feed into your algorithms. Now assuming you have that  it’s also important to think through what you’re doing with machine learning…

Scenario

Let’s say you run a 24 hour retail store selling groceries and you have reliable data about when your customers visit your store and how much they spend.

Data analysis of 24 hour retail store

So decide you want to have your new impressive algorithm figure out when the best time is to open the store for business. So you feed said machine learning algorithm with data. If figures out that most of your business occurs is in the mid morning and late afternoons. So based on these results you decide to close the store during the non busy hours as that’s what the machine learning algorithm told you and that’s great.

Now fast-forward about three years. The economy has changed and now many people don’t hold your standard 9-5 job instead they have much more flexible schedules and so most people now prefer to shop late at night when they can get their shopping done and out of the way in order to do more important things during the day.

Missing data = missing insights

Well now you don’t have any data covering the full 24 hour life cycle so no machine learning algorithm can infer about this new trend and give you an updated recommendation to keep your store open late at night. As you can see from this example no machine can make up for the data that you lost.

There are two lessons to learn from this example:

Lesson 1. Consider the use of machine learning strategically and for the long term if not you may not be able to get better for the long term.

Lesson 2. Human validation; the human element is important, in this case humans can use knowledge about their environment to make educated guesses in order to conclude that opening the store late at night will help.

It’s important to remember that at this time of science and innovation machine learning is akin to what car navigation systems used to be like a few years back. That is even if your satnav told you to make a right turn that would send you off a cliff we’d use our common sense –use critical thinking to understand that the systems recommendation was shorted sighted. Understanding this limitation gave us the psychological permission to override decision given to us by a machine. This is because implicitly we understand that just because a process has been automated based on data it doesn’t automatically make that process correct or neutral. The grocery store is a simple example how data can misguide our decisions –but I’m sure you can think of similar problems with many more dimensions of data where you will need machine learning.

Hope this helps…