Friday, 15 March 2013

"The signal and the noise" by Nate Silver

Nate is the man who called 49 out of 50 states correctly in the 2008 US Presidential election; he has earned money from online poker as well as by setting up a baseball stats prediction website. It was therefore refreshing to find that he was so sceptical about the prediction industry.

His concerns are:

  • Human forecasters are unreliable because:
    • As humans we are prone to attending best to data that confirms our prejudices. 
    • Sometimes there are incentives to be biassed (people prefer weather forecasters who predict rain when it turns out sunny rather than vice versa so there is a perverse incentive to over-forecast rain)
    • As humans we like spotting patterns even in random noise
    • Many of use are 'hedgehogs' who specialise in one very big idea (rather than 'foxes' who generalise in lots of ideas) and we seek to tell 'stories' that use data when it adds to the narrative and explains it away when it doesn't
    • We're rubbish at estimating probabilities: in particular, we convert small probabilities into impossibilities and large probabilities into certainties
  • Most patterns in data are swamped by noise
  • Things such as earthquakes and stock markets follow a power law (probably because of feedback effects)  which makes statistical forecasting possible but exact prediction impossible: we know the chance of a big earthquake striking LA but we don't know when it will happen
  • You can only make a decent living from poker (or the stock market) while there are enough bad players/ investors in the pool whom you can exploit. As they drop out it becomes harder and harder to have an edge. 
  • Big data crunching methods ('data-mining') are likely to find patterns because if you have enough variables all wobbling randomly then for some time at least two of them will correlate. So the patterns data-mining finds are quite possibly false positives. 


This means that you need to start with a theory before looking at the data.

This means that Bayesian logic is the way to go to detect patterns. First establish your prior probabilistic expectations, then run the experiment or check the data and then refine your probabilities.

This was a good book though it didn't live up to my expectations. It was easy to read but I was a bit bored by the obsession with baseball.

I certainly learnt about Bayes! March 2013; 454 pages

No comments:

Post a Comment