How do I get started?
Installation
Recommended (pip)
We recommend installing the latest stable version of bayes_toolbox
with pip (from the Terminal):
pip install bayes_toolbox
From source: Cloning and building
The latest development version of bayes-toolbox
can be cloned from GitHub using git
:
git clone https://github.com/hyosubkim/bayes-toolbox.git
To build and install the project (from the root directory, i.e., inside the cloned bayes-toolbox
directory):
python3 -m pip install -e .
Dependencies
bayes_toolbox
has the following dependencies (all of which should be automatically imported with a pip installation):
- aesara
- arviz
- numpy
- pandas
- pymc
Virtual environment
You can create a virtual environment with all of the necessary dependencies. If you've cloned the bayes-toolbox
repository, make sure you're in the root directory of the cloned repository (i.e., bayes-toolbox
) and type the following conda command in the Terminal (Anaconda is strongly recommended for installing Python and the conda utility on your computer):
conda env create --name bayes_toolbox --file environment.yml
Instead of bayes_toolbox
, you may wish to give your environment a different name.
If you're not using MacOSX and want to replicate this environment, read the "Export your environment" section of this page.
After cloning and installing bayes-toolbox
locally, you can access it from any directory.
Accessing the correct kernel from JupyterLab
If you've created the bayes_toolbox
virtual environment and want to access the correct kernel from a Jupyter notebook, you must manually add the kernel for your new virtual environment "bayes_toolbox" (or whatever you named it). To do so, you first need to install ipykernel:
pip install --user ipykernel
Next, add your virtual environment to Jupyter:
python -m ipykernel install --user --name=MYENV
Use whatever you named your virtual environment in place of MYENV
. That should be all that's necessary in order to choose your new virtual environment as a kernel from your Jupyter notebook. For more details, read this page.
How do I learn to use bayes-toolbox?
The BEST
notebook (short for "Bayesian Estimation Supersedes the t-Test", a famous 2013 article by John Kruschke) in the examples
directory is a good place to see how bayes-toolbox
can be used to make implementing Bayesian analyses easy. I've adapted the notebook of the same name from the PyMC developers to show how the model building and MCMC sampling are all embedded in a single function now. You can see similar workflows for other model types in all of the other example notebooks, which track several of the chapters from "Doing Bayesian Data Analysis" and is modeled off of Jordi Warmenhoven's repo.
Example syntax
Following imports of the most common Python packages for data analysis and Bayesian statistics, import bayes_toolbox
.
# Usual imports
import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc as pm
import seaborn as sns
import xarray as xr
# Import the bayes-toolbox package
import bayes_toolbox.glm as bg
Import the data you want to model (the following example can be found in the examples
subdirectory). So far, these are all standard steps and not specific to bayes-toolbox
.
# Import data (from 'examples' subdirectory) into pandas data frame
df = pd.read_csv("data/HierLinRegressData.csv")
df.Subj = df_HRegr.Subj.astype("category")
df.Subj = df_HRegr.Subj.cat.as_ordered()
Now, with bayes-toolbox
, if you want to run a fairly sophisticated multi-level (hierarchical) linear regression model in which you are modeling individual as well as group-level slope and intercept parameters, simply call the appropriate function:
# Call your bayes-toolbox function and return the PyMC model and InferenceData objects
model, idata = bg.hierarchical_regression(
df["X"], df["Y"], df["Subj"], acceptance_rate=0.95
)
Before, this exact same analysis would have taken many more lines of code:
# Standardize variables
zx, mu_x, sigma_x = standardize(df["X"])
zy, mu_y, sigma_y = standardize(df["Y")
# Convert subject variable to categorical dtype if it is not already
subj_idx, subj_levels, n_subj = parse_categorical(df["Subj"])
# Define your statistical model
with pm.Model(coords={"subj": subj_levels}) as model:
# Hyperpriors
zbeta0 = pm.Normal("zbeta0", mu=0, tau=1 / 10**2)
zbeta1 = pm.Normal("zbeta1", mu=0, tau=1 / 10**2)
zsigma0 = pm.Uniform("zsigma0", 10**-3, 10**3)
zsigma1 = pm.Uniform("zsigma1", 10**-3, 10**3)
# Priors for individual subject parameters
zbeta0_s_offset = pm.Normal("zbeta0_s_offset", mu=0, sigma=1, dims="subj")
zbeta0_s = pm.Deterministic(
"zbeta0_s", zbeta0 + zbeta0_s_offset * zsigma0, dims="subj"
)
zbeta1_s_offset = pm.Normal("zbeta1_s_offset", mu=0, sigma=1, dims="subj")
zbeta1_s = pm.Deterministic(
"zbeta1_s", zbeta1 + zbeta1_s_offset * zsigma1, dims="subj"
)
mu = zbeta0_s[subj_idx] + zbeta1_s[subj_idx] * zx
zsigma = pm.Uniform("zsigma", 10**-3, 10**3)
nu_minus_one = pm.Exponential("nu_minus_one", 1 / 29)
nu = pm.Deterministic("nu", nu_minus_one + 1)
# Define likelihood function
likelihood = pm.StudentT("likelihood", nu, mu=mu, sigma=zsigma, observed=zy)
# Sample from posterior
idata = pm.sample(draws=n_draws, target_accept=acceptance_rate)