Jay Ver Hoef, National Oceanic and Atmospheric Administration
Why all statistical models should be spatial
This talk will contain some philosophical musings on modeling data, and why I think that we should always use spatial models. I begin with a quick review of spatial statistics, drawing connections from probability, inference, and the linear model to Popper's philosophy, Occam's razor, and Neyman-Pearson hypothesis testing. I present the idea that independence is not an appropriate null model on deciding whether or not to adopt a spatial model. However, there are technical issues for spatial statistics, for both very small sample sizes, and very large sample sizes, that have stymied their use. For a long time, sample sizes on the order of 100 to 1000 have been practical limits. For small sample sizes the problems centered on poor estimates of spatial autocorrelation. However, these problems can be mitigated by marginalization of parameter estimates. I show the connection between the t-distribution and MCMC sampling, and how those ideas can be extended to spatial models with small sample sizes. I verify their performance through simulations. For larger data, there are now a plethora of methods. I review some of those methods, and illustrate one that I am developing based on data partitioning. I show how to model tens of thousands of samples in mere minutes, verify the method with simulations, and illustrate it for stream network data. In summary, many technical issues have been solved for spatial models, from small to large sample sizes, and it is time for statisticians and scientists to adopt the more complicated spatial models as default.