Data is no substitute for sense

The phrase ‘data-driven’ is becoming a pet peeve. Like ‘action-oriented’ or having a ‘growth mindset’, it’s a little too commonplace for its own good. And not to ruin the party, but I’d rather not be driven by data, thank you very much.

data

This is not a Luddite position. There’s no denying how powerful data is. It’s the fuel for devastatingly disruptive business models from Shenzhen to Shoreditch, and it’s natural that people would want to associate with this winning formula.

But are the mighty Google and the gang really driven by data, or do they just use it really, really well? There’s an important difference. To be data-driven implies that the data speaks for itself, and that you can only make good decisions if you’re armed with sufficient quantities of it. This view, encapsulated in W Edwards Deming’s famous quip that “without data, you’re just another person with an opinion”, is becoming widespread. (If you don’t believe me, just browse any respected MBA syllabus and see how many modules have the D-word in it.)

This is a problem for all sorts of reasons. For a start, it assumes that the alternative to using data to guide your decisions is to use guesswork, which isn’t true. Human beings take all sorts of information into account when making decisions, not just the information that can be quantified and entered into a spreadsheet.

Indeed, in complex, uncertain and ambiguous circumstances in particular, good decision making usually involves making sense of and integrating often disparate types of information – what our customers bought, yes, but also how they feel and what it feels like to be them.

The data-data-and-more-data philosophy of business also risks valuing quantity over quality (everyone wants their data big, when really you should want your data good), and perhaps most perniciously blinding us to our own biases.

In 2003, George W Bush had plenty of data supporting his belief that Iraq had weapons of mass destruction; that didn’t make the WMDs any more real. He saw what he wanted to see. Worse, he asked his intelligence operatives to find what he wanted to see. That data was then interpreted and presented as fact, something that can’t be argued with.

Data is not distilled truth

The whole data process – defining the question you’re trying to answer, choosing what data to examine, “cleaning” it to remove anomalies or errors, analysing it and interpreting what it means – involves human beings, in all their fallibility.

Imagine you’re trying to decide which of two product options to roll out. You might start by asking customers which they would prefer. That’s fine, except what people say and what they end up doing are often very different. And can you be sure that the people who responded to your request were a representative sample of your overall customer base? And are you sure you didn’t use a leading question? Or that your questions haven’t missed something important, like why people prefer option B?

Alternatively, you might try A/B testing to see which product performs better in the real world. But what does better mean? Better volumes, better margins, better among existing customers or new ones? Better over a one-week period or over a year?

Correlation ≠ Causation

Even if you get all that right and see a pattern, you can still be misled. Data is very good at showing correlations, but correlation doesn’t equal causation. In his book Dark Data, mathematician David Hand uses the example of children’s vocabulary test scores, which are correlated with their height. Older children are taller and older children do better on vocab tests, but unsurprisingly paying for English tutoring doesn’t make your kids grow.

None of this will be shocking to anyone with any sense, and particularly to any data scientist worth their salt. Top data officers – and digitally-savvy CEOs – understand both the potential and the limitations of their data. They build a culture of asking the right questions. Most importantly, they use data science as a tool to help them make better decisions, to find useful insights and to optimise their processes, but they are not driven by it. They do the driving, and that’s how it ought to stay.