Tricky statistics: questions and answers

Tricky statistics: questions and answers


In order to test a hypothesis in traditional (“frequentist”) statistics, you posit an alternative called the “null hypothesis”. The null hypothesis should be chosen so as to represent the default situation, for instance: If your hypothesis is that a certain drug can help treat disease X, your null hypothesis would typically be that the drug works no better than giving patients placebos.

Once you have obtained your data, you calculate the probability of obtaining this particular data, assuming that the null hypothesis is true. If the probability is too low (5% and 1% being typical thresholds), you reject your null hypothesis and, in turn, accept your original hypothesis.

However, there are subtle ways of how your choice of original hypothesis can influence your calculation for the probability of your null hypothesis. Notably, when searching for periodicities in impact crater data, you need to take properly into account whether the particular periodicity (say, 13 million years vs. 50 million years) you decide to test against is something you have derived from your data or posited independently of what you have observed. Ignoring this difference can skew your analysis, introducing a bias against the null hypothesis.

Also, in the case of impact craters, there appears to be an underlying trend: an increase in impact rate over the past 250 million years. In the presence of such a trend, the usual null hypothesis assuming constant impact probability is not a valid comparison.

Bayesian inference is an alternative approach to testing a hypothesis that proceeds as follows. To start with, you need different alternative hypotheses. In this case, Bailer-Jones chose constant impact probability; simple periodic (sinusoidal) variations of the impact rate; the case where such periodic variations govern only part of the impact probability; an underlying trend; an underlying trend plus periodic variations.

Before looking at your data, you assume that you cannot decide which of the hypotheses is more probable: you assign an equal prior probability to all of them. Bayesian inference allows you to use your data – in this case the approximate dates or date ranges for different impact craters – to adjust the probabilities. In particular, it tells you how likely each hypothesis is, given the data you have measured. By comparing these probabilities, you can decide which of your initial hypotheses is the most likely, and how much more likely than the runners-up it is.

Bayesian inference is not a cure-all for statistical problems. It has some subtleties of its own, notably concerning the proper choice of prior probabilities. For this particular task, that is, the analysis of time series data with different uncertainties (and, in some cases, upper age limits for a crater only), it is a highly suitable tool that allows well-founded statements about the different hypotheses under discussion. Periodic variation in the cratering rate is strongly disfavoured in all data sets, and there is no evidence for a periodicity added to an underlying general trend. This result is found to be quite robust to the specific assumptions about the priors.

 
loading content