Bayesian data analysis allows us to quantify the certainty surrounding our conclusions. I’ve previously talked about how traditional null hypothesis significance testing (NHST) and its related p-values is tied to concerns about reproducible science. One of the ways that Bayesian data analysis improves upon NHST is by asking the right question. Instead of answering the question “given the null hypothesis, how likely is our data?” (a plain language interpretation of a p-value) we can answer our actual question “given the data, how likely is our hypothesis?”
Another way that Bayesian data analysis improves upon NHST is the inherent way that it produces probabilities–levels of credibility–for a range of possible parameter values. This allows us to move beyond the conversation of significant/not-significant and focus on what range of parameter values we find credible.
The only unfortunate aspect of Bayesian data analysis is that it is not routinely taught as part of a graduate student’s general education in science. My aim here is not to teach these techniques, but to point you to a list of references that I found immensely useful in understanding Bayesian data analysis and applying it to my own work.
The philosophy of Bayesian statistics
John Kurschke and Torrin Liddell provide an excellent introduction to the thinking behind Bayesian statistics with the analogy of Sherlock Holmes’ statement that “when you have eliminated the impossible, whatever remains, however improbable, must be the truth.” This succinctly describes our goal when we apply Bayesian data analysis techniques to the question of how meaningful our scientific results are.
Holmes started with various degrees of suspicion about each suspect, then collected new evidence, then re-allocated degrees of suspicion to the suspects who were not eliminated by the evidence.
When we collect data about some phenomenon of interest and we describe the data with a mathematical model, usually our first question is about what parameter values are credible given the actual data.
Instead of the yes-or-no thinking behind frequentist NHST, Bayesian analysis frames the question as one of reallocating credibility across the range of possible answers (Kruschke and Liddell, 2018).
John Kruschke provided specific insight into the process of Bayesian analysis in his reply to Gelman and Shalizi, who had argued that the posterior predictive check was fundamentally a non-Bayesian add-on to the process of assessing scientific results (Kruschke, 2013). The posterior predictive check is the step of Bayesian data analysis where we verify that the most credible values from the posterior distribution does not deviate from the data we collected in notable, systemic ways (as it might, for example, if we are using the wrong mathematical model).
Kruschke argued that the posterior predictive check is in fact Bayesian in nature and that this check is important
…for breaking out of an initially assumed space of models. Philosophically, the conclusion allows the liberation to be completely Bayesian instead of relying on a non-Bayesian deus ex machina.”
Bayesian statistics in practice
Finally, we get to the question of how to actually apply Bayesian statistical techniques to our own work. On this topic, I cannot recommend enough John Kruschke’s textbook on the subject, which explains the underlying mathematics for these methods, provides worked examples, and is accompanied by open source code (Kruschke, 2015). Topics covered include
- Models, probability, and Bayes’s rule
- Credibility and model parameters
- Markov chain Monte Carlo algorithms
- Hierarchical models
- Bayesian approaches to hypothesis testing
For a “machine learning”-specific tutorial, Alessio Benavoli et al.’s tutorial on the subject is also clear and is also accompanied by open source code (Benavoli et al., 2017). And at 36 pages long, it’s a more focused introduction to the practice of Bayesian statistics than Kruschke’s in-depth textbook. The straightforward argument put forward in the tutorial is
The machine learning community adopted the use of null hypothesis significance testing (NHST) in order to ensure the statistical validity of results. Many scientific fields however realized the shortcomings of frequentist reasoning and in the most radical cases even banned its use in publications. We should do the same: just as we have embraced the Bayesian paradigm in the development of new machine learning methods, so we should also use it in the analysis of our own results. We argue for abandonment of NHST by exposing its fallacies and, more importantly, offer better—more sound and useful—alternatives for it.
In a primer on applying Bayesian statistics, Rens van de Schoot et al. walk the reader through the stages of Bayesian analysis including specifying the distribution of the prior, deriving inference, and checking models (van de Schoot et al., 2021). They also discuss technical topics of interest such as
…the importance of prior and posterior predictive checking, selecting a proper technique for sampling from a posterior distribution, variational inference and variable selection.
Finally, when it comes time to report the results from Bayesian data analysis in your manuscript or thesis, we revisit John Kruschke for his Bayesian Analysis Reporting Guidelines (Kruschke, 2021). Following these guidelines is a great way to stay on track yourself while ensuring that other researchers will be able to verify your analyses. At a high level, these guidelines ensure that you:
- Explain the goal of your Bayesian data analysis (if required by the audience)
- Explain your mathematical model(s)
- Explained how you computed your reported values
- Describe the posterior distribution resulting from your analysis
- Report any decisions and their criteria
- Report results from sensitivity analysis
- Provide details necessary for other scientists to reproduce your analysis
- J. K. Kruschke and T. M. Liddell, “The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective,” Psychon. Bull. Rev., vol. 25, no. 1, pp. 178–206, Feb. 2018. DOI
- J.K. Kruschke, “Posterior predictive checks can and should be Bayesian: Comment on Gelman and Shalizi, ‘Philosophy and the practice of Bayesian statistics’,” Br. J. Math. Stat. Psychol., vol. 66, no. 1, pp. 45-56, Feb. 2013. DOI
- J. K. Kruschke, “Doing Bayesian data analysis: a tutorial with R, JAGS, and Stan,” second edition. Academic Press, 2015. LINK
- A. Benavoli, G. Corani, J. Demšar, and M. Zaffalon, “Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis,” J. Mach. Learn. Res., vol. 18, no. 77, pp. 1–36, 2017. LINK
- R. van de Schoot, S. Depaoli, R. King, et al., “Bayesian statistics and modelling,” Nat Rev Methods Primers, vol. 1, art. no. 16, Jan. 2021. DOI
- J.K. Kruschke, “Bayesian Analysis Reporting Guidelines,” Nat Hum Behav, vol. 5, pp. 1282–1291 Aug. 2021. DOI