Introduction to BayesGmed

Motivation

This vignette introduces the R package BayesGmed, a package designed for a Bayesian causal mediation analysis in Stan, a probabilistic programming language for Bayesian inference. BayesGmed uses a parametric approach for the specification of the outcome and mediator model, and uses the g-formula approach for estimation. In addition to the estimation of causal mediation effects, the package also allows researchers to conduct sensitivity analysis.

Getting started

You can install from Github via devtools

devtools::install_gitlab(belayb/BayesGmed)

Example

BayesGmed comes with an example data, example_data, which contains a binary outcome \(Y\), a single binary mediator \(M\), a binary exposure \(A\), and two numeric confounding variables \(Z = (Z_1, Z_2)\). The first 6 rows of the data is visualized below.

library(BayesGmed)
head(example_data)
#>   Z1 Z2 A M Y
#> 1  0  1 1 1 0
#> 2  1  0 0 1 0
#> 3  0  0 0 0 0
#> 4  1  1 1 0 0
#> 5  0  1 1 0 0
#> 6  1  0 1 1 0

The aim in this example data is to estimate the direct and indirect effect of \(A\) on \(Y\) adjusting for \(Z\). To do this, we may proced as follow.

\[logit( P(Y_{i} = 1|A_{i},M_{i},\mathbf{Z}_{i}) ) = \alpha_{0} + \mathbf{\alpha}_{Z}^{'}\mathbf{Z}_{i} + \alpha_{A}A_{i} + \alpha_{M}M_{i},\]

\[logit(P(M_{i} = 1| A_{i},\mathbf{Z}_{i} ) ) = \beta_{0} + \mathbf{\beta}_{Z}^{'}\mathbf{Z}_{i} + \beta_{A}A_{i}.\]

To complete the model specification, we assume the following priors for the model parameters.

\[ \begin{aligned} \beta &\sim MVN(location_m, scale_m)\\ \alpha &\sim MVN(location_y, scale_y) \end{aligned} \]

We then need to specify the parameters of the prior distrbutions or assume a hyper-prior. BayesGmed currently only allow fixed values and these values can be passed as part of the model call and if not the following defult values will be used

P <- 3 # number of covariates plus the intercept term 
priors <- list(scale_m = 2.5*diag(P+1), 
               scale_y = 2.5*diag(P+2),
               location_m = rep(0, P+1), 
               location_y = rep(0, P+2))

The model can then be fitted as follow

fit1 <- bayesgmed(outcome = "Y", mediator =  "M", treat = "A", covariates = c("Z1", "Z2"), dist.y = "binary",
dist.m = "binary", link.y = "logit", link.m = "logit", data = example_data)
#> 
#> SAMPLING FOR MODEL 'BY_BM_single' NOW (CHAIN 1).
#> Chain 1: 
#> Chain 1: Gradient evaluation took 0.00015 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.5 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1: 
#> Chain 1: 
#> Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 1: 
#> Chain 1:  Elapsed Time: 0.967 seconds (Warm-up)
#> Chain 1:                0.979 seconds (Sampling)
#> Chain 1:                1.946 seconds (Total)
#> Chain 1: 
#> 
#> SAMPLING FOR MODEL 'BY_BM_single' NOW (CHAIN 2).
#> Chain 2: 
#> Chain 2: Gradient evaluation took 7.2e-05 seconds
#> Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.72 seconds.
#> Chain 2: Adjust your expectations accordingly!
#> Chain 2: 
#> Chain 2: 
#> Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 2: 
#> Chain 2:  Elapsed Time: 0.948 seconds (Warm-up)
#> Chain 2:                0.981 seconds (Sampling)
#> Chain 2:                1.929 seconds (Total)
#> Chain 2: 
#> 
#> SAMPLING FOR MODEL 'BY_BM_single' NOW (CHAIN 3).
#> Chain 3: 
#> Chain 3: Gradient evaluation took 8.7e-05 seconds
#> Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.87 seconds.
#> Chain 3: Adjust your expectations accordingly!
#> Chain 3: 
#> Chain 3: 
#> Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 3: 
#> Chain 3:  Elapsed Time: 0.946 seconds (Warm-up)
#> Chain 3:                0.988 seconds (Sampling)
#> Chain 3:                1.934 seconds (Total)
#> Chain 3: 
#> 
#> SAMPLING FOR MODEL 'BY_BM_single' NOW (CHAIN 4).
#> Chain 4: 
#> Chain 4: Gradient evaluation took 6.3e-05 seconds
#> Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.63 seconds.
#> Chain 4: Adjust your expectations accordingly!
#> Chain 4: 
#> Chain 4: 
#> Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
#> Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
#> Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
#> Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
#> Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
#> Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
#> Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
#> Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
#> Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
#> Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
#> Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
#> Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
#> Chain 4: 
#> Chain 4:  Elapsed Time: 0.958 seconds (Warm-up)
#> Chain 4:                0.996 seconds (Sampling)
#> Chain 4:                1.954 seconds (Total)
#> Chain 4:

bayesgmed_summary(fit1)
#>     Parameter  Mean    SE Median   2.5% 97.5% n_eff Rhat
#> 1 NDE_control 0.263 0.067  0.263  0.133 0.393  4844    1
#> 2 NDE_treated 0.285 0.069  0.287  0.150 0.413  4922    1
#> 3 NIE_control 0.042 0.037  0.040 -0.027 0.117  4599    1
#> 4 NIE_treated 0.065 0.048  0.063 -0.030 0.160  4259    1
#> 5        ANDE 0.274 0.063  0.275  0.152 0.397  5098    1
#> 6        ANIE 0.053 0.034  0.053 -0.012 0.122  4645    1
#> 7          TE 0.328 0.065  0.330  0.200 0.453  4472    1