The shrinking of the big data promise

Markets do not have much patience for a commitment to techniques that don’t deliver. Unfortunately, spy agencies aren’t subject to this discipline

using a computer
It is not sorcery to predict that a woman who buys folic acid is pregnant Photograph: Johner Images / Alamy/Alamy

“Regression to the mean” is one of the subtlest concepts in statistical literacy – and yet it’s terribly simple. In plain English, “regression to the mean” is the idea that normally, things are pretty normal. That is, if you observe something abnormal – a high fever, a high share price, a long, unseasonable stretch of sunny or rainy days – then chances are that all will soon fall back within normal range. If you’re sick, you’ll probably get better (this is why so many quack cold remedies “work” – they take your mind off the passage of time while you wait for regression to the mean to assert itself).

Regression to the mean applies to commerce. An advertising technique that works very well today will probably work less well over time, as repetition and competition eat away at its novelty. Look at those “ghost ads” on the sides of faded Victorian buildings: apparently, it was once profitable to advertise soap with a slogan like “It makes you clean.”

Remember when Xynga’s “social games” like Farmville seemed to colonise the limbic systems of everyone you knew, stealing away their hours with a fiendishly addictive game-mechanic? In short order, most of Xynga’s players grew inured to the game’s temptations, leaving behind a rump of especially susceptible players who were not enough to sustain the game, nor its makers’ high-flying share price.

Likewise, the “surveillance business model” of building up detailed electronic dossiers on internet users in order to predict what they want to buy and how to sell it to them produced some genuinely impressive results in its early years. The serendipity of seeing an ad for something you had been thinking about proved very powerful in the early days of Facebook and the first generation of “retargeting” services.

But a look at Facebook’s ad-card rates shows that the novelty of this technique wears off fast. Facebook was founded on the premise that it could use its mounting dossier on your behavior to figure out how to sell you things faster than your natural defenses would repel its pitches. If that ideology was borne out, you’d expect to see the company’s cost-per-thousand ad rates climbing into the stratosphere. Instead, they’re damned close to the rate you’ll pay for regular, minimally targeted display advertising elsewhere.

The quarterly Facebook investor calls tell the story: Facebook’s growth area for revenue aren’t from predictive, targeted ads. Instead, the company is making good money from venture-backed games firms that pay “per acquisition” – from users actually installing their games. As Facebook doesn’t have to pay itself to advertise on its own service, it can simply wash its service in ads for games until the acquisitions take place. The cost per acquisition on Facebook is substantially in excess of what any game company could hope to earn from an average player, suggesting that this line of business is due for a crash.

The other profitable line for Facebook is sneakier, and possibly longer-lived. The company can easily see which of the commercial/brand/business pages on its service are growing fastest. These correspond to the businesses that are exerting the most energy to get their customers to follow them on Facebook and making Facebook most integral to their daily business.

When Facebook’s algorithms predict that a business is well and truly reliant upon Facebook to reach its customers, it simply switches off the business’s ability to reach those customers, so that new updates only go to a small fraction of the company’s followers. Thereafter, a Facebook salesperson gives the business a call and offer to turn the tap back on – for a price. That’s not the surveillance business-model. It’s a much older one: the drug-dealer business-model, where the first taste is free.

The Big Data success stories for predicting human behavior over long terms don’t bear scrutiny. It’s not a triumph of big data to predict that someone searching for “used cars” might respond to an ad for used cars. Neither is it sorcery to predict that a woman who buys folic acid is pregnant. It’s not big data to get paid when someone clicks on a loan application or installs a game.

Markets don’t solve all our problems, but they do not have much patience for an irrational, ideological commitment to techniques that don’t deliver. Facebook doubtless has internal fiefdoms that will be threatened by the company backing off its surveillance commitment, but it is also growing non-surveillance-oriented tendrils as fast as it can.

Unfortunately, spy agencies aren’t subject to this kind of discipline. The fact that the billions spent spying on everyone, always, has spectacularly failed to catch any terrorists is taken as proof that they’re not doing enough surveillance – not that untargeted, mass surveillance without particularised suspicion is a waste of money.

Of course, it helps that there are so many contractors and suppliers who lobby for spies to buy more of their gear and services.

Every technology is overhyped at its birth. The Gartner Hype Cycle has Big Data sliding into the long, deep “trough of disillusionment.” As the cycle astute observes, overpromising doesn’t mean there’s no there there. As big data techniques stabilise into a few applications where it works well and long, more of the surveillance business model will blow away.