Pearl’s Front Door Criterion

Let some treatment \(T\) affect an outcome \(Y\). Mediation is the study of mechanisms or the ways in which \(T\) affects \(Y\). For instance, we may think \(T\) moves some intervening variable \(M\), which then affects \(Y\). Alternatively \(T\) could move \(N\) which in turn affects \(Y\). There can also be a direct causal pathway connecting \(T\) to \(Y\).

Figure 1: Causal Diagram

Figure 1: Causal Diagram

Pearl says that we can retreive the effect of \(T\) on \(Y\) despite confounding by an unknown variable \(U\). Call this the “front door method”. It involves:

Step 1: Estimate the effect of \(T\) on \(M\) and \(N\). We can retrieve \(\beta_{T \rightarrow M}\) and \(\beta_{T \rightarrow N}\) because there is no confounding and all backdoor paths are blocked by collider \(Y\).

Step 2: Estimate the effect of \(M\) on \(Y\) (and separately \(N\) on \(Y\)). Again, we can retrieve \(\beta_{M \rightarrow Y}\) and \(\beta_{N \rightarrow Y}\) by blocking all backdoor paths if we condition on \(T\). That is, by controlling for \(T\), we block \(M \rightarrow T \rightarrow N \rightarrow Y\) in the regression \(Y \sim M + T\); and \(N \rightarrow T \rightarrow M \rightarrow Y\) in the regression \(Y \sim N + T\).

Step 3: Finally, get the total effect of \(T\) on \(Y\):

\((\beta_{T \rightarrow M} \cdot \beta_{M \rightarrow Y}) + (\beta_{T \rightarrow N} \cdot \beta_{N \rightarrow Y})\)


Pearl’s estimation strategy invokes two assumptions:

  1. Exhaustiveness: We must know all the causal pathways that connect \(T\) to \(Y\). That is: the conditioning variables (\(M\) and \(N\)) intercept all directed paths from the causal variable \(T\) to the outcome variable \(Y\).

Intuition: Say we do not observe \(N\), so \(T \rightarrow M \rightarrow Y\) is not exhaustive. This is a problem because \(\beta_{T \rightarrow N}\) and \(\beta_{N \rightarrow Y}\) cannot be estimated; and thus the full causal effect of \(T\) on \(Y\) cannot be retrieved.

  1. Isolation: The mechanisms (\(T \rightarrow M \rightarrow Y\) and \(T \rightarrow N \rightarrow Y\)) should be “isolated” from all unblocked backdoor paths so that we can recover the full causal effect. This implies two things:

    1. There are no unblocked back-door paths connecting \(T\) and the mediators (\(M\) and \(N\))

    2. All backdoor paths from the mediator (\(M\) or \(N\)) to the outcome variable (\(Y\)) can be blocked by conditioning on the causal variable (\(T\))


Baron-Kenney equations

In regression terms, this roughly translates to estimating three models:

Model 1: \(M_i = \alpha_1 + aT_i + e_{1,i}\) (equivalent to Step 1 in Pearl’s procedure but with one mediator)

Model 2: \(Y_i = \alpha_2 + cT_i + e_{2,i}\) (for the total effect of \(T\) on \(Y\))

Model 3: \(Y_i = \alpha_3 + dT_i + b M_i + e_{3,i}\) (equivalent to Step 2 in Pearl’s procedure where we condition on \(T\) to get the effect of \(M\) on \(Y\)).

Inputing equation 1 into 3 gives us:

\(Y_i = \alpha_3 + (d+ab)T_i + (\alpha_1 + e_{1,i})b + e_{3,i}\)

Where \(d\) is the direct effect of \(T\) on \(Y\) (through paths other than \(M\)), and \(a\cdot b\) is the indirect effect via \(M\). The total effect \(c = d + (a \cdot b)\).


Critiques

  1. \(E[a_i \cdot b_i] \neq E[a_i]\cdot E[b_i]\)

In both Pearl’s approach and the Baron-Kenney equations, we estimate the average effect of \(T\) on \(M\), and \(M\) on \(Y\) and multiply these coefficients to get the indirect effect. However, these effects might vary across units: for every \(i\) \(a_i\) captures the effect of \(T\) on \(M\), and \(b_i\) the effect of \(M\) on \(Y\). For any given unit \(i\) the indirect effect is:

\(a_i \times b_i\)

And averaging over all units, we get:

\(E[a_i b_i] = E[a_i] \cdot E[b_i] + Cov[a_i,b_i] = (a \cdot b) + Cov[a_i,b_i]\)

Where \(a\) and \(b\) are regression coefficients from models 1 and 3, which when multiplied do not give the indirect effect unless \(Cov[a_i,b_i]= 0\) (which is only the case under constant treatment effects)

  1. Model 3 provides biased estimates for several reasons. First, we condition on a post-treatment variable (\(M_i\)). Second, \(M_i\) is non-randomly assigned and potentially related to unmeasured causes of \(Y_i\). Formally: \(M_i \not\!\perp\!\!\!\perp e_{3,i}\). Another way of saying this is that Pearl’s isolation condition is violated because there is an active backdoor path even after conditioning on \(T_i\) : \(M_i \rightarrow e_{3,i} \rightarrow Y_i\)
Figure 2: M and Unknown Causes of Y

Figure 2: M and Unknown Causes of Y

These things jointly imply that we underestimate \(d\) and overestimate \(b\):

\(\widehat{b_{N \rightarrow \infty}} = b + \frac{Cov[e_{1,i},e_{3,i}]}{Var(e_{1,i})}\)

\(\widehat{d_{N \rightarrow \infty}} = d - a \cdot \frac{Cov[e_{1,i},e_{3,i}]}{Var(e_{1,i})}\)

Where \(\frac{Cov[e_{1,i},e_{3,i}]}{Var(e_{1,i})}\) is the coefficient from \(e_{3,i} \sim e_{1,i}\) (or the correlation between errors from model 1 and 3). Another way of thinking about this is that there are some unknown causes of \(M\) that also affect \(Y\) and bias estimates in Model 3.


Sequential Ignorability

In the previous section we identified non-random assignment of the mediator \(M_i\) as a cause of bias, chiefly because \(E[M_i\cdot e_{3,i}] \neq 0\). Sequential ignorability assumes away this problem by asserting conditional independence of the mediator. To understand exactly what this means, I will first introduce some potential outcomes notation, then a formal statement of the assumption.

Potential Outcomes

Let every unit have a treatment status \(T_i \in \{0,1\}\), and two outcomes:

  • A mediator \(M_i\) that can take the value \(M_i(1)\) when \(i\) is treated, and \(M_i(0)\) when \(i\) is untreated.

  • An outcome \(Y_i\) whose value depends on \(T_i\) and \(M_i\): \(Y_i(T_i, M_i(T_i))\). Accordingly, we have four potential outcomes:

\(Y_i(T_i = 1, M_i(T_i = 1))\) or \(i\)’s outcome when she is treated and the mediator takes the value it would when \(i\) is treated.

\(Y_i(T_i = 0, M_i(T_i = 0))\) or \(i\)’s outcome when she is untreated and the mediator takes the value it would when \(i\) is untreated.

\(Y_i(T_i = 1, M_i(T_i = 0))\) or \(i\)’s outcome when she is treated and the mediator takes the value it would when \(i\) is untreated. This is an imaginary potential outcome because treatment typically moves the mediator, so for units with \(T_i =1\) we do not observe \(M_i(0)\).

\(Y_i(T_i = 0, M_i(T_i = 1))\) or \(i\)’s outcome when she is untreated and the mediator takes the value it would when \(i\) is treated. Again, this is an imaginary potential outcome because \(M_i(1)\) is not observed in the absence of treatment (i.e. for units with \(T_i = 0\)).


Assumption Statement

According to Imai et al. (2011), we need “sequential ignorability” for unbiased estimates in mediation analysis. This assumption states:

\(\{Y_i(t,m), M_i(t)\} \!\perp\!\!\!\perp T_i | X_i = x\)

‘Given the observed pretreatment confounders \(X_i\), the treatment assignment is assumed to be ignorable - statistically independent of potential outcomes and potential mediators. This part of the assumption is often called no-omitted-variable bias, exogeneity, or unconfoundedness’ (Kosuke et al. 2011:770)

\(Y_i(t,m) \!\perp\!\!\!\perp M_i(t) | T_i , X_i = x\)

‘The observed mediator is ignorable given the actual treatment status and pre-treatment confounders. Once we have conditioned on a set of covariates gathered before treatment, the mediator status is ignorable’ (Kosuke et al 2011:770).


Intuition: The first part of this assumption requires (minimally) that treatment status is (conditionally) independent of potential outcomes. Note that in experimental settings \(\{Y_i(1),Y_i(0)\} \!\perp\!\!\!\perp T_i\), and in observational studies we condition on observed covariates and claim treatment is as-if randomly assigned within each covariate profile: \(\{Y_i(1),Y_i(0)\} \!\perp\!\!\!\perp T_i | X_i = x\). This assumption implies both those things and that treatment status is independent of mediator values.

The second part of the assumption goes further. It says that conditional on having a certain covariate profile \(X_i =x\) and treatment status \(T_i = t\), potential outcomes do not predict mediator values. Put another way, unobserved causes of \(Y\) (or \(e_{3,i}\)) do not predict mediator status or values: \(E[M_i(t)\cdot e_{3,i} | X=x, T_i = t] = 0\).

Note that this problem does not go away if we randomly assign the mediator. To satisfy this condition, we must randomly assign units to natural values of the mediator (what they would be had \(i\) been treated or untreated): \(M_i(1)\) and \(M_i(0)\). To clarify this, lets work through an example.

Practice: Kosuke Table I

Lets focus on two distinct estimands: the causal mediation effect \(Y_i(t, M(1)) - Y_i(t,M(0))\) and the causal effect of the mediator \(Y_i(t,1) - Y_i(t,0)\). Consider a simple case in which treatment \(T_i\), mediator \(M_i\), and the outcome \(Y_i\) are all binary variables that can take two values \(0\) and \(1\). Using an abridged version of Table 1 from Imai et al. (2011), estimate the average causal mediation effect (ACME) and the average causal effect of the mediator:

Table 1, Reproduced from Imai et al. (2011)
Population Proportion M(1) M(0) Y(t,1) Y(t,0)
0.3 1 0 0 1
0.3 0 0 1 0
0.1 0 1 0 1
0.3 1 1 1 0

Answer: Lets start with the causal effect of the mediator (“ATE” of the mediator):

\(E[Y_i(t,1) - Y_i(t,0)] = E[Y_i(t,1)] - E[Y_i(t,0)]\)

\((0 \cdot 0.3 + 1 \cdot 0.3 + 0 \cdot 0.1 + 1 \cdot 0.3) - (1 \cdot 0.3 + 0 \cdot 0.3 + 1 \cdot 0.1 + 0 \cdot 0.3)\)

\(0.6 - 0.4 = 0.2\)

Now lets calculate the average causal mediation effect (“ACME”):

For 30% of the population (row 1): \(Y_i(t,M(1)) = Y_i(t,1) = 0\) and \(Y_i(t,M(0)) = Y_i(t,0) = 1\). So the causal mediation effect is: \(0 - 1 = -1\)

For another 30% of the population (row 2): \(Y_i(t,M(1)) = Y_i(t,0) = 0\) and \(Y_i(t,M(0)) = Y_i(t,0) = 0\). So the causal mediation effect is: \(0 - 0 = 0\)

For the next 10% of the population (row 3): \(Y_i(t,M(1)) = Y_i(t,0) = 1\) and \(Y_i(t,M(0)) = Y_i(t,1) = 0\). So the causal mediation effect is: \(1 - 0 = 1\)

For the remaining 30% of the population (row 4): \(Y_i(t,M(1)) = Y_i(t,1) = 1\) and \(Y_i(t,M(0)) = Y_i(t,1) = 1\). So the causal mediation effect is: \(1 - 1 = 0\)

Now taking a weighted average:

\(E[Y_i(t, M(1)) - Y_i(t,M(0))] = (0.3 \cdot -1) + (0.3 \cdot 0) + (0.1 \cdot 1) + (0.3 \cdot 0) = -0.3 + 0.1 = -0.2\)

Conclusion: The causal effect of the mediator is positive 0.2, but the mediation effect is negative -0.2. In other words, if we randomly assign a mediator \(M\), we estimate its “causal effect” on \(Y\), which may not be the same as the “mediation effect”.


Practice 2: Toy Dataset

Compute the following quantities using the dataset below:

  • The average total effect of \(Z_i\) on \(Y_i\)

\(E[Y_i(M(1), Z=1) - Y_i(M(0), Z = 0)]\)

  • The average direct effect of \(Z_i\) on \(Y_i\) holding \(M_i\) constant at \(M_i(0)\)

\(E[Y_i(M(0), \color{red}{Z=1}) - Y_i(M(0), \color{red}{Z=0)}]\)

  • The average indirect effect that \(Z_i\) transmits through \(M_i\) to \(Y_i\) when \(Z_i =0\)

\(E[Y_i(\color{red}{M(1)}, Z=0) - Y_i(\color{red}{M(0)}, Z=0)]\)

Table 2, Adapted from Gerber and Green (2012)
Y(m=0,z=0) Y(m=0,z=1) Y(m=1,z=0) Y(m=1,z=1) M(z=0) M(z=1)
1 0 0 0 0 0 0
2 0 0 0 1 0 1
3 0 0 1 1 1 1
4 0 1 1 1 1 0

Here is what I get:

Average total effect of \(Z_i\) on \(Y_i\):

Total Effect

Total Effect

Average direct effect of \(Z_i\) on \(Y_i\) holding \(M_i\) at \(M_i(0)\):

ADE when M = M(0)

ADE when M = M(0)

Average indirect effect of \(Z_i\) on \(Y_i\) via \(M_i\), when \(Z_i = 0\):

ACME when Z = 0

ACME when Z = 0


mediation in R

(Code adapted from Alex Coppock’s section slides)

Finally, lets look at the mediation package in R, which allows us to estimate the direct and indirect effects using the Baron-Kenney equations.

library(estimatr)
library(fabricatr)

# Step 0: Create some data, n = 1000, and a binary mediator (0 or 1)

data <- fabricate(
  N = 1000,
  probs_MZ0 = runif(N), # prob of revealing M(Z=0)
  YZ0M0 = rnorm(N, mean = 10*probs_MZ0, 1),
  YZ0M1 = YZ0M0 + 4,
  YZ1M0 = YZ0M0 + 2,
  YZ1M1 = YZ0M0 + 6,
  M_Z0 = rbinom(N, size = 1, prob = probs_MZ0),
  M_Z1 = ifelse(M_Z0==0, rbinom(sum(M_Z0 == 0), size = 1, prob = .5), M_Z0), #Ensures monotonicity M(1) >= M(0)
  Z = complete_ra(N, m = 500)
)

# Step 1: True value of different estimands

## Effect of Z on M: E[M | Z=1] - E[M | Z=0]

with(data, mean(M_Z1 - M_Z0))
## [1] 0.265
## True ATE: E[Y | Z=1] - E[Y | Z=0]

with(data, {mean(
  (YZ1M1 * M_Z1 + YZ1M0 * (1 - M_Z1)) 
  - (YZ0M1 * M_Z0 + YZ0M0 * (1 - M_Z0))
  )})
## [1] 3.06
## ACME when Z=1: E[Y(M(1), Z =1) - Y(M(0), Z = 1)]

with(data, {mean(
  (YZ1M1 * M_Z1 + YZ1M0 * (1 - M_Z1)) - 
    (YZ1M1 * M_Z0 + YZ1M0 * (1 - M_Z0)))
  })
## [1] 1.06
## ADE when M = M(1): E[Y(M(1), Z=1) - Y(M(1), Z = 0)] 

with(data, {mean(
  (YZ1M1 * M_Z1 + YZ1M0 * (1 - M_Z1)) - 
    (YZ0M1 * M_Z1 + YZ0M0 * (1 - M_Z1)))
  })
## [1] 2
# Step 3: Estimate these quantities based on revealed potential outcomes

## Step 3A: Switching equation to get revealed potential outcomes

data <- data %>% mutate(M = Z*M_Z1 + (1-Z)*M_Z0,
                        Y = ifelse(Z == 1 & M == 1, YZ1M1, ifelse(Z==1 & M==0, YZ1M0, ifelse(
                          Z==0 & M ==1, YZ0M1,YZ0M0))))
kable(head(data))
ID probs_MZ0 YZ0M0 YZ0M1 YZ1M0 YZ1M1 M_Z0 M_Z1 Z M Y
0001 0.2989329 3.945021 7.945021 5.945021 9.945021 1 1 0 1 7.945021
0002 0.1630708 2.191958 6.191958 4.191958 8.191958 0 1 0 0 2.191958
0003 0.1682144 1.446947 5.446947 3.446947 7.446947 0 1 1 1 7.446947
0004 0.7708428 5.222161 9.222161 7.222161 11.222161 1 1 1 1 11.222161
0005 0.2783377 3.635199 7.635199 5.635199 9.635199 1 1 0 1 7.635199
0006 0.6640565 5.849337 9.849337 7.849337 11.849337 1 1 1 1 11.849337
## Step 4: Using "mediation" package in R

# First, Baron-Kenney equations:

model1 <- lm(M ~ Z, data)  

model3 <- lm(Y ~ Z + M, data)

# Direct effect:
model3$coefficients["Z"]
##        Z 
## 1.499514
# Indirect effect (a*b):
model1$coefficients["Z"]*model3$coefficients["M"]
##        Z 
## 1.816139
# Now doing the same thing using the mediation package

library(mediation)

med.out <- mediate(
  model.m = model1,
  model.y = model3,
  treat = "Z",
  mediator = "M",
  sims = 1000
)

summary(med.out)
## 
## Causal Mediation Analysis 
## 
## Quasi-Bayesian Confidence Intervals
## 
##                Estimate 95% CI Lower 95% CI Upper p-value    
## ACME              1.808        1.403         2.21  <2e-16 ***
## ADE               1.500        1.143         1.86  <2e-16 ***
## Total Effect      3.308        2.748         3.84  <2e-16 ***
## Prop. Mediated    0.545        0.463         0.63  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Sample Size Used: 1000 
## 
## 
## Simulations: 1000
plot(med.out)

For more on the mediation package, read this. For instance, if you include covariates in the Baron-Kenney equations, specify a list of covariates. If you want robust standard errors (instead of bootstrapped standard errors), specify boot = FALSE and robustSE = TRUE.


Sensitivity Tests

The mediation package also allows us to conduct sensitivity analyses on the average causal mediation effect (ACME) and average direct effect (ADE) for potential violations of the sequential ignorability assumption. As per the R-documentation:

“The analysis proceeds by quantifying the degree of sequential ignorability violation as the correlation between the error terms of the mediator [\(e_{1,i}\)] and outcome [\(e_{3,i}\)] models, and then calculating the true values of the average causal mediation effect for given values of this sensitivity parameter, \(\rho\)” (For more, see Tingley et al. Section 3.4 here)

You will remember that the bias in the Baron-Kenney estimation procedure was:

\(\widehat{b_{N \rightarrow \infty}} = b + \color{red}{\frac{Cov[e_{1,i},e_{3,i}]}{Var(e_{1,i})}}\)

\(\widehat{d_{N \rightarrow \infty}} = d - \color{red}{a \cdot \frac{Cov[e_{1,i},e_{3,i}]}{Var(e_{1,i})}}\)

medsens estimates the true value of a parameter, given some correlation in errors. It repeats this process for different values of \(\frac{Cov[e_{1,i},e_{3,i}]}{Var(e_{1,i})}\), till the true value becomes 0. This is the amount of confounding (or sequential ignorability assumption violation) necessary to “wipe-out” the effect.

Lets see this at work in code:

sensitivity <- mediation::medsens(x = med.out, rho.by = 0.1, effect.type = "indirect")
plot(sensitivity)

Reverse Causality

Finally, one might think that sensitivity tests provide meaningful information about sequential ignorability assumption violations. If so, we can proceed with mediation analysis and just show how sensitive our findings are to potential violations. This is trickier than it seems because the sensitivity tests break down when the inputed causal diagram is wrong. To see this, consider two causal diagrams: on the left which is the “correct” one, and on the right the “incorrect” one asserted by a researcher:

Case of Reverse Causality

Case of Reverse Causality

The main intuition is that if we assume the wrong causal model (i.e. \(M\) mediates the effect of \(Z\) on \(Y\), when in fact \(Y\) mediates the effect of \(Z\) on \(M\)), a sensitivity test will not catch this:

library(fabricatr)
library(mediation)

# Step 0: Create some data with the left-hand causal diagram in mind
data <- fabricate(N = 1000, Z = randomizr::complete_ra(N), Y = rnorm(N, mean = 0, 
    sd = 1) + Z * rnorm(N, mean = 2, sd = 4), M = 0.5 * Y + rnorm(N, 5, 2))

model1 <- lm(M ~ Z, data)
model3 <- lm(Y ~ Z + M, data)

# Step 1: Do mediation analysis
med.out2 <- mediate(model.m = model1, model.y = model3, treat = "Z", mediator = "M", 
    sims = 1000)
summary(med.out2)
## 
## Causal Mediation Analysis 
## 
## Quasi-Bayesian Confidence Intervals
## 
##                Estimate 95% CI Lower 95% CI Upper p-value    
## ACME              0.419        0.209         0.64  <2e-16 ***
## ADE               1.237        0.931         1.56  <2e-16 ***
## Total Effect      1.656        1.270         2.05  <2e-16 ***
## Prop. Mediated    0.255        0.138         0.36  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Sample Size Used: 1000 
## 
## 
## Simulations: 1000
sensitivity.test <- medsens(x = med.out2, rho.by = 0.1)
plot(sensitivity.test)

From the above analysis, we might conclude that \(M_i\) mediates the effect of \(Z_i\) on \(Y_i\); and that the indirect effect is robust to minor violations of the sequential ignorability assumption (i.e. \(e_{1,i}\) and \(e_{3,i}\) need to be highly correlated for the true effect to be 0). However, as we know from the data generating process, \(M_i\) is not a mediator. It is a consequence of \(Y_i\). But our sensitivity test does not catch this. It assumes the researcher’s causal ordering of variables is correct, which it is not.