bayes-toolbox

coverage

bayes-toolbox (aka, Bayesian Statistics Toolbox [BST]) is a Python package for running sophisticated Bayesian analyses in a simple, straight forward manner.

Statement of Need

Research

bayes-toolbox provides you with the tools for utilizing and exploring Bayesian statistics in your own research projects right away. By wrapping model definitions of Bayesian generalized linear models into convenient functions, bayes-toolbox makes it easy to run Bayesian analyses that are analogous to some of the most commonly used frequentist tests in the behavioral and neural sciences (think t-tests, ANOVAs, regression). Right now, Python users can choose between several packages that allow for one-liners to be called in order to run classical/frequentist tests (e.g., Pingouin, SciPy, pandas, statsmodels). In contast, for Bayesian statistics there has been Bambi, which is excellent, but it does require more advanced knowledge and familiarity with R-brms syntax. Therefore, the goal of bayes-toolbox is to fill an important gap in the Python/Bayesian community, by providing an easy-to-use module for less experienced users that makes it as simple to run Bayesian stats as it is to run frequentist stats. As all of the models (tests) are executable with one-liners, they are ideal for use in an open, replicable (and Bayesian) workflow (watch this PyMCon talk to learn more). Example use cases and tests for nearly every model are provided in the examples directory, so you can see what a sensible Bayesian data analysis pipeline looks like. (Many of the example notebooks are adaptations of Jordi Warmenhoven's Python/PyMC3 port of John Kruschke's excellent textbook "Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan (DBDA)".

The benefits of using bayes-toolbox in terms of time and convenience will be most noticeable when utilizing hierarchical (multi-level) models, including the ANOVA-like ones. This is because bayes-toolbox takes care of the more finicky steps involved in Bayesian statistical modeling with embedded functions for things like standardizing/unstandardizing variables for more efficient MCMC sampling, parsing categorical variables for easier indexing, and implementing sum-to-zero constraints in ANOVA-like models. These are the sorts of implementational details that can add time (and frustration) when creating an analysis pipeline and discourage otherwise interested scientists from using Bayesian statistics. bayes-toolbox now removes those obstacles.

The package is actively being developed in a public GitHub repository, and we always welcome new contributors! No contribution is too small. If you have any issues with the code, suggestions on how to improve it, or have requests for additional content, whether new features or tutorials, please open an issue on Github.

Education

bayes-toolbox will be very useful if you are going through Doing Bayesian Data Analysis and want to learn how to implement the models in Python/PyMC. We also highly recommend going through some of the PyMC tutorials to supplement your understanding. The PyMC developers have also adapted many ideas from the Kruschke text.

Look before you leap

Please note that the models in bayes-toolbox all utilize fairly uninformative, diffuse priors, which are, for the most part, the exact same ones used in the Kruschke text. New users or those with “prior paralysis” will likely be relieved to know that these diffuse, uninformative priors will not exert undue influence over posterior estimates and will likely satisfy skeptical reviewers. However, keep in mind that even though bayes-toolbox offers a streamlined interface for performing common statistical tests, the assumptions John Kruschke and we, the developers, make in these models may not be the ones you want to make for your particular question. Therefore, it's a good idea to go through the notebooks in the examples to make sure the model is appropriate for your applications. Part of the beauty of Bayesian modeling is its flexibility, so if you want to change priors/hyperpriors/etc., feel free to use bayes-toolbox as model scaffolding for a new bespoke model fit to your purpose. And consider making it a contribution! You may also want to explore using Bambi.

Dependencies

Some of the main libraries used in this project:

Testing and Functionality

In addition to thorough formal testing of the functions that make up bayes-toolbox, the statistical models have all been validated against known results (i.e., "ground truth"). Specifically, in the examples directory, you will find that each model has been run on the same data used in DBDA. All the results have been compared to those in the textbook, and against the results produced in another Python port of the textbook (https://github.com/JWarmenhoven/DBDA-python). Only subtle numerical discrepancies due to the nature of MCMC sampling, as well as differences between RStan and PyMC, have been detected.

NOTE
Use the links in the navigation bar to the left, the search bar in the upper left, or the content pages below to get started!

Getting Started
Tutorials
Future Plans
How to Contribute

Bayesian models currently included (frequentist analogue in parentheses)

See API Reference for comprehensive list
Comparison of two groups (independent samples t-test)
Comparison of single or paired samples (paired t-test)
Simple linear regression
Multiple regression
Multi-level (hierarchical) linear regression for modeling group- and individual-specific parameters
Hierarchical (multi-level) model of metric outcome with single categorical predictor (one-way ANOVA)
Hierarchical (multi-level) model of metric outcome with single categorical and single metric predictors (ANCOVA)
Hierarchical (multi-level) model of metric outcome with two categorical predictors (two-way ANOVA)
Hierarchical (multi-level) model of metric outcome with multiple categorical predictors and repeated measures (mixed-model ANOVA)
Logistic regression models incorporating categorical or metric predictors
Meta-analysis of binary outcomes using random effects model

For a more weapons-grade Bayesian statistical modeling interface, check out:

Bambi: BAyesian Model-Building Interface (BAMBI) in Python.

While Bambi requires model formulas, bayes-toolbox instead requires calling the function associated with a particular test.

Citing BST

If you use bayes-toolbox in your work, please cite our Journal of Open Source Software (JOSS) article:

APA format:
Kim, H. E. (2023). bayes-toolbox: A Python package for Bayesian statistics. Journal of Open Source Software, 8(90), 5526. https://doi.org/10.21105/joss.05526

BibTeX format:

@article{Kim_bayes-toolbox_A_Python_2023,
author = {Kim, Hyosub E.},
doi = {10.21105/joss.05526},
journal = {Journal of Open Source Software},
month = oct,
number = {90},
pages = {5526},
title = {{bayes-toolbox: A Python package for Bayesian statistics}},
url = {https://joss.theoj.org/papers/10.21105/joss.05526},
volume = {8},
year = {2023}
}

License

This work is distributed under a MIT license.

Acknowledgments

Thank you to the following people for generously sharing their work and knowledge:
- John Kruschke
- Richard McElreath
- Jordi Warmenhoven - This project grew out of updating Jordi's great Python/PyMC 3.0. port of the Kruschke textbook.
- PyMC developers
- ArviZ developers