Made to measure
Google owes an awful lot to Herman Hollerith. The American engineering professor’s electric tabulating machine was employed by the U.S. Census Bureau in 1890 to navigate the nation’s biggest data mine at the time, the 63 million forms submitted for the national census. It was a massive success. The results were published in a single year, down from eight years in the previous census.
Now big data is everywhere. Technology companies like retailer Amazon and search engine Google have grown fat on the idea that more information means better and more timely services. The next frontier could be Hollerith’s domain, official statistics. Modern data processing seems an obvious aid in calculating key measures such as GDP, price indices and unemployment. Trillions of pieces of information, which today’s computers can process almost instantaneously, surely beats a few thousand questionnaires.
Well, yes and no. Carefully chosen small samples are actually much more accurate than poorly selected masses of data. That may sound strange, but statistical theory shows that a random selection of 100 members provides a quite good picture of any large group. Back in 1890, only 4,147 people would have needed to be polled to be 99 percent sure of the views of all Americans, with an accuracy of plus or minus 2 percent.
The trick is to find a truly representative sample. For elections, pollsters ensure that their samples look like the overall population in terms of wealth, age and whatever other characteristics seem to correlate with distinct patterns of voting. It is much harder to find accurate samples for shifting or localised economic activities, such as new business formation or the probability of moving house.
Big Data can help. Sometimes numerous local records can be combined fairly easily into one large database. Sometimes, massive data collected for commercial purposes can be manipulated. Great commercial successes have been achieved by companies like Wal-Mart from rapid processing of customer data. Alibaba, a huge Chinese online retailer, hopes to use Big Data on its online shoppers to script blockbuster movies.
Translate this into the world of official statistics, though, and gains are likely to be modest. More data is not always better. And for data both big and small, if the sample is wrong, the conclusions may be too. Besides, statisticians want series that they can count on for many years, and as data collection technologies evolve, numbers may get less comparable.
At best, Big Data can add speed and precision. But no quantity of new sources and processing power can address the biggest problem with key economic measures: they do not work well in the contemporary economy.
Consider real GDP growth, widely used as the prime measure of national economic success. The number is an incomplete hodgepodge of estimations, approximations and omissions. It doesn’t capture capital depreciation, environmental depredation, ageing demographics or unpaid domestic labour.
However, the interest in the statistic is so great that a whole industry of “nowcasting” has sprung up to produce GDP estimates faster than the official agencies can manage. The Federal Reserve Bank of Atlanta is an early adopter. But success will not make GDP a better indicator. Nowcasting might be able to provide a superior alternative measure of economic momentum, but traditional sampling techniques could probably do the same, with just as much speed and more accuracy.
It is much the same for the inflation rate. The Billion Prices Project at the Massachusetts Institute of Technology uses the cost of goods online to create its own price index. The results come close to the official consumer price index, but just like their predecessor, they cannot find a universally relevant basket of goods and services, nor deal with housing prices or tell the difference between things that cost more because they are higher quality, not because of inflation.
Similarly, payroll giant ADP mines its own Big Data to provide a U.S. private sector employment report. But it can’t say whether the labour market is healthy. A rise in precarious jobs is a sign of weakness, even if the unemployment rate falls. And sometimes unemployment can be productive – think of parenting or studying.
Of course, Big Data has uses well beyond official economic statistics. Hype and unrealistic promises aside, experts in cultural analysis, criminal justice, urban planning and retail promotion can certainly use new information made available by ultra-cheap processing and storage. As for companies, they will keep announcing Big Data strategies, and investors may reward them for it. However, when it comes to monitoring the economy of the data age, the picture is already clear. More data is more of a distraction than a panacea.