Going through graduate school in the sciences, you learn a lot outside of your core research area. You learn–or hopefully you learn–supporting skills like experimental design, producing effective visualizations, and writing for academic audiences. And one skill that most of us learn along the way is how to go about testing the statistical significance of our results.
I’ve performed statistical tests myself for every experiment I’ve run and dutifully reported the results when I submitted the manuscript for publication. Statistical significance is a key way in which we judge manuscripts, and for good reason. We’re all interested, authors included, in knowing if the results reported in a manuscript are meaningful. The most common way of demonstrating this, and the way that I first learned, is the frequentist method of null hypothesis significance testing (NHST). But there is a different, and better, way.
Frequentist Statistics
Let’s start with our goal. The whole reason for statistical significance testing in the first place is to help answer the question “are these results meaningful?” Or, more precisely, the question “based on the data I have collected, how likely is my hypothesis?” In NHST we start by stating a null hypothesis, which is essentially the hypothesis that our results are not meaningful. For example, our null hypothesis might be that the men and women of a particular country have the same average height. We then follow the general procedure:
- Based on the type of comparison we are making, select the appropriate test.
- Use this test to calculate a particular test statistic based on the data you have collected.
- Convert this test statistic into a p-value.
- Compare this p-value to 0.05 (or some other, smaller number).
In plain terms, we figure out how likely we would be to observe our data if the null hypothesis was true. If we have less than a 5% chance of observing our data given the null hypothesis, then we make the black-and-white decision to reject the null hypothesis. But this is not the question we set out to answer!
Bayesian Statistics
Bayesian statistics offer us a way to calculate a numerical answer to the question we are asking. Consider our question “based on the data I have collected, how likely is my hypothesis?” If our hypothesis is that height amongst both men and women follows a normal distribution with parameters \(H\) and if we have collected a set of observed heights \(D\), then the probability that our hypothesis is true given our data is simply \(P(H|D)\). And with Bayes’ Rule this can be rewritten as:
$$P(H|D) = \frac{P(D|H)P(H)}{P(D)}$$
What’s more, we can calculate this probability for all possible (or at least plausible) hypotheses \(H’\), giving us a distribution of probabilities over these possible (plausible) parameter values. This opens the door to straightforward statements about the range of parameter values that are most likely and the use of probabilities to describe our beliefs about a particular value.
I have purposely stayed above a lot of the mathematical rigour in both frequentist and Bayesian statistics because the core argument for the latter, in my opinion, is a high-level argument. When we perform statistical significance testing it’s because we want to make some statement about believing a particular hypothesis given our collected observations. Frequentist statistics approaches this question in a roundabout way, ultimately giving us the probability of the data given the hypothesis: \(P(D|H)\). Bayesian statistics, on the other hand, gives us a number that corresponds to the answer we are looking for: the probability of a hypothesis given that data: \(P(H|D)\).
If you’re interested in learning more about Bayesian statistics, I recommend the following tutorials to get you going:
- J. K. Kruschke, Doing Bayesian data analysis: a tutorial with R, JAGS, and Stan, Second Edition. Academic Press, 2015.
- A. Benavoli, G. Corani, J. Demšar, and M. Zaffalon, “Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis,” J. Mach. Learn. Res., vol. 18, no. 77, pp. 1–36, 2017.
- J. K. Kruschke and T. M. Liddell, “The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective,” Psychon. Bull. Rev., vol. 25, no. 1, pp. 178–206, Feb. 2018.
You can also check out a few of my own papers where I use Bayesian statistical methods:
- R.H. Moulton, H.L. Viktor, N. Japkowicz, and J. Gama, “Contextual One-Class Classification in Data Streams,” arXiv, 2019.
- R.H. Moulton, K. Rudie, S.P. Dukelow, B.W. Benson, and S.H. Scott, “Capacity limits lead to information bottlenecks in ongoing rapid motor behaviours,” eNeuro, vol. 10, no. 3, pp. 1–15, 2023.
1 Comment. Leave new
[…] about reproducible science. One of the ways that Bayesian data analysis improves upon NHST is by asking the right question. Instead of answering the question “given the null hypothesis, how likely is our data?” […]