Suppose that we run an experiment in which some units are assigned to control (\(Z=0\)), others to a condition in which minimal effort is made to treat units (\(Z=1\)), and a third condition in which maximal effort is made to treat units (\(Z=2\)). In essence, we are manipulating compliance rates, and want to compare treatment effects for different types of compliers. Under one-sided non-compliance, we can define types and estimands.
Type | Z=0 | Z=1 | Z=2 |
---|---|---|---|
Never Takers | 0 | 0 | 0 |
Minimal Compliers | 0 | 1 | 1 |
Maximal Compliers | 0 | 0 | 1 |
\(E[Y_i | Z = 2] - E[Y_i | Z = 0]\)
\(= E[Y_i(0) | Z=2, NT]\pi_{NT} + E[Y_i(1) | Z=2, Min]\pi_{Min} + E[Y_i(1) | Z=2, Max]\pi_{Max}\)
\(- \{E[Y_i(0) | Z=0, NT]\pi_{NT} - E[Y_i(0) | Z=0, Min]\pi_{Min} - E[Y_i(0) | Z=0, Max]\pi_{Max}\}\)
Because of random assignment, each experimental group is a “random subset” of the population with similar proportions of each type (\(\pi\)’s). Due to the excludability assumption, \(E[Y_i(0) | Z=2, NT] = E[Y_i(0) | Z=0, NT]\). This is because potential outcomes respond to \(d\), not \(z\), and \(d(z=2) = d(z=0) = 0\) for never takers.
This then simplifies to:
\(E[Y_i(Z=2,D=1) - Y_i(Z=0,D=0) | Min]\pi_{Min} + E[Y_i(Z=2,D=1) - Y_i(Z=0,D=0) | Max]\pi_{Max}\)
\(= E[Y_i(1) - Y_i(0) | Complier]\pi_{Min+Max}\)
Thus: \(E[Y_i | Z = 2] - E[Y_i | Z = 0] = CACE \pi_{C}\)
And \(CACE = \frac{E[Y_i | Z = 2] - E[Y_i | Z = 0]}{\pi_C}\)
\(E[Y_i | Z = 1] - E[Y_i | Z = 0]\)
\(= E[Y_i(0) | Z=1, NT]\pi_{NT} + E[Y_i(1) | Z=1, Min]\pi_{Min} + E[Y_i(0) | Z=1, Max]\pi_{Max}\)
\(- \{E[Y_i(0) | Z=0, NT]\pi_{NT} + E[Y_i(0) | Z=0, Min]\pi_{Min} + E[Y_i(0) | Z=0, Max]\pi_{Max}\}\)
\(= E[Y_i(1) - Y_i(0)|Min]\pi_{Min}\)
And then: \(CACE_{Min} = \frac{E[Y_i | Z = 1] - E[Y_i | Z = 0]}{\pi_{Min}}\)
\(E[Y_i | Z = 2] - E[Y_i | Z = 1]\)
\(= E[Y_i(0) | Z=2, NT]\pi_{NT} + E[Y_i(1) | Z=2, Min]\pi_{Min} + E[Y_i(1) | Z=2, Max]\pi_{Max}\)
\(- \{E[Y_i(0) | Z=1, NT]\pi_{NT} + E[Y_i(1) | Z=1, Min]\pi_{Min} + E[Y_i(0) | Z=1, Max]\pi_{Max}\}\)
\(= E[Y_i(1) - Y_i(0)|Max]\pi_{Max}\)
And then: \(CACE_{Max} = \frac{E[Y_i | Z = 2] - E[Y_i | Z = 1]}{\pi_{Max}}\)
Using the table from Question 9, compute the CACE for minimal and maximal compliers.
Quantity | Control | Min_Effort | Max_Effort |
---|---|---|---|
Percent reached by callers | 0.00 | 29.97 | 47.31 |
Percent Voting | 55.89 | 55.91 | 56.53 |
N | 317182.00 | 7500.00 | 7500.00 |
\(CACE_{Min} = \frac{0.5591-0.5589}{0.2997} \approx 0.0007\)
\(CACE_{Max} = \frac{0.5653 - 0.5591}{0.4731 - 0.2997} \approx 0.0358\)
Consider now a variant of the above table. Can you estimate the mean outcome for compliers in the treatment and control groups, under one-sided non-compliance?
Quantity | Control | Treatment |
---|---|---|
Percent reached by callers | 0.00 | 47.31 |
Turnout among those not contacted by canvassers | 55.89 | 40.50 |
Overall turnout | 55.89 | 56.53 |
Solution:
Lets start by re-writing the above information in a more intuitive way:
Quantity | Control | Treatment |
---|---|---|
Percent reached by callers | 0 | 47.31 |
Turnout among Never Takers | 40.5 | 40.5 |
Turnout among Compliers | Don’t Know | Don’t Know |
Overall turnout | 55.89 | 56.53 |
Step 1: To get the mean control outcome for compliers, write the control group mean as a weighted average of types:
\(E[Y_i | Z = 0] = E[Y_i | Z=0, NT]\pi_{NT} + E[Y_i | Z=0, C]\pi_{C}\)
\(0.5589 = 0.405\cdot(1 - 0.4731) + x\cdot(0.4731)\)
So \(x = \frac{0.5589 - (0.405 \cdot 0.5269)}{0.4731} = 0.7303\)
Step 2: To get the mean treated outcome for compliers, write the treatment group mean as a weighted average of types:
\(E[Y_i | Z = 1] = E[Y_i | Z=1, NT]\pi_{NT} + E[Y_i | Z=1, C]\pi_{C}\)
\(0.5653 = 0.405\cdot(1 - 0.4731) + x\cdot(0.4731)\)
Then \(x = \frac{0.5653 - (0.405 \cdot 0.5269)}{0.4731} = 0.7438\)
Step 3: The complier average causal effect can then be estimated in two ways:
\(CACE = E[Y_i | Z=1, Compliers] - E[Y_i | Z=0, Compliers] = 0.7438 - 0.7303 \approx 0.0135\)
Alternatively \(CACE = \frac{ITT_Y}{ITT_D} = \frac{0.5653 - 0.5589}{0.4731} \approx 0.0135\)
Gerber and Green (2012) make several important points about the complier average causal effect (p147-9). Here, I discuss four of them in detail.
Since the \(CACE = \frac{ITT_Y}{ITT_D}\), a misconception may arise that if \(ITT_D\) increases, the CACE decreases because we are dividing \(ITT_Y\) by a bigger number. This is not the case because ‘increasing the share of compliers also change[s] the numerator [\(ITT_Y\)], depending on how these extra compliers respond to treatment’ (Gerber and Green 2012:147). This is demonstrated using a small dataset below.
Task: Estimate the complier average causal effect for the first four rows of the dataset, then for the full dataset. Does the compliance rate increase or decrease with the additional observation? How does the CACE change as a result of this?
Unit | Y | Z | D |
---|---|---|---|
1 | 10 | 1 | 1 |
2 | 6 | 1 | 0 |
3 | 6 | 0 | 0 |
4 | 4 | 0 | 0 |
5 | 20 | 1 | 1 |
Solution:
Considering only the first four rows:
\(ITT_Y = \frac{16}{2} - \frac{10}{2} = 3\)
\(ITT_D = \frac{1}{2} - 0 = 0.5\)
Consequently, \(CACE = \frac{3}{0.5} = 6\)
Now consider the full dataset:
\(ITT_Y = \frac{36}{3} - \frac{10}{2} = 12 - 5 = 7\)
\(ITT_D = \frac{2}{3} - 0 = 0.667\)
And thus, \(CACE = \frac{7}{0.667} \approx 10.5\).
Bottomline: Both the compliance rate and CACE can increase simultaneously. In this example, \(ITT_D\) moved from 0.5 to 0.667; and the \(CACE\) from 6 to 10.5. This is because the additional complier reported a high outcome value that considerably increased the \(ITT_Y\).
We know that the complier average causal effect is a ratio of two quantities: \(\frac{ITT_Y}{ITT_D}\). Using a data sample, we estimate these two quantities: \(\hat{ITT_Y}\) and \(\hat{ITT_D}\). This ratio (\(\frac{\hat{ITT_Y}}{\hat{ITT_D}}\)) is a consistent but biased estimator of the true CACE.
Intuition: We know that \(\hat{ITT_Y}\) and \(\hat{ITT_D}\) are unbiased estimators of \(ITT_Y\) and \(ITT_D\). In other words: \(E[\hat{ITT_Y}] = ITT_Y\) and \(E[\hat{ITT_D}] = ITT_D\). However, ‘the ratio of two unbiased estimators is not an unbiased estimator for the ratio of the two estimands’ (Gerber and Green 2012:151):
\(E[\frac{\hat{ITT_Y}}{\hat{ITT_D}}] \neq \frac{E[\hat{ITT_Y}]}{E[\hat{ITT_D}]} = \frac{ITT_Y}{ITT_D}\).
This is because for any two random variables \(X\) and \(Y\), \(E[\frac{X}{Y}] = \frac{E[X] - Cov[\frac{X}{Y},Y]}{E[Y]}\). In our context:
\(E[\frac{\hat{ITT_Y}}{\hat{ITT_D}}] = \frac{E[\hat{ITT_Y}] - Cov[\frac{\hat{ITT_Y}}{\hat{ITT_D}},\hat{ITT_D}]}{E[\hat{ITT_D}]} = \frac{ITT_Y - Cov[\frac{\hat{ITT_Y}}{\hat{ITT_D}},\hat{ITT_D}]}{ITT_D}\)
Which can be stated as the true complier average causal effect plus a bias term:
\(\frac{ITT_Y}{ITT_D} - \color{red}{\frac{Cov[\frac{\hat{ITT_Y}}{\hat{ITT_D}},\hat{ITT_D}]}{ITT_D}}\)
Claim: When the \(ITT_D\) is close to zero, even a slight violation of the exclusion restriction may severely bias the estimation of the CACE (Gerber and Green 2012:149)
Proof: Write the ITT as a weighted average of compliers and never takers
\(ITT_Y = E[Y_i(z=1,d=1) - Y_i(z=0,d=0) |C] \cdot ITT_D + E[Y_i(z=1,d=0) - Y_i(z=0,d=0) | NT]\cdot (1 - ITT_D)\)
Now, if we divide the entire equation by \(ITT_D\):
\(\frac{ITT_Y}{ITT_D} = CACE + \color{red}{\frac{1 - ITT_D}{ITT_D} E[Y_i(z=1,d=0) - Y_i(z=0,d=0) | NT]}\)
Note that if the exclusion restriction holds, \(E[Y_i(z=1,d=0)] = E[Y_i(z=0,d=0)]\). But even a slight violation can produce a large bias when \(ITT_D\) is small. As \(ITT_D \rightarrow 0, \frac{1-ITT_D}{ITT_D} \rightarrow \infty\).
Consider a situation in which there is two-sided non-compliance, and we have defiers. How does the CACE theorem breakdown in such cases?
Write the ITT as a weighted-average of types:
\(E[Y|Z=1] - E[Y|Z=0]\)
\(= E[Y_i(z=1,d=0) - Y_i(z=0,d=0) | NT]\pi_{NT}\)
\(+ E[Y_i(z=1,d=1) - Y_i(z=0,d=1) | AT]\pi_{AT}\)
\(+ E[Y_i(z=1,d=1) - Y_i(z=0,d=0) | C]\pi_{C}\)
\(+ E[Y_i(z=1,d=0) - Y_i(z=0,d=1) | D]\pi_{D}\)
If the exclusion restriction holds, the first two terms equal zero. But if \(\pi_D \neq 0\), then \(ITT = (ATE | C)\pi_{C} - (ATE| D)\pi_{D}\).
Note: \((ATE|D) = E[Y_i(d=1) - Y_i(d=0) | D]\), and the ITT term has \(E[Y_i(d=0) - Y_i(d=1) | D]\) which equals to \(-(ATE|D)\). Hence the minus sign in the final expression: \(ITT = (ATE | C)\pi_{C} - (ATE| D)\pi_{D}\)
Furthermore, when there are defiers, \(ITT_D\) no longer measures the proportion of compliers (\(\pi_C\)):
\(ITT_D = E[D|Z=1] - E[D|Z=0] = (\pi_{C} + \pi_{AT}) - (\pi_{AT} + \pi_{D}) = \pi_{C} - \pi_{D}\)
When there are defiers, the ratio \(\frac{ITT_Y}{ITT_D} = \frac{(ATE | C)\pi_{C} - (ATE| D)\pi_{D}}{\pi_{C} - \pi_{D}}\).
As we have seen in two-sided noncompliance, the monotonicity assumption (\(d_i(1) \geq d_i(0)\)) rules out defiers. I will now demonstrate how such an assumption restricts the subject type space in other cases.
Setup: Consider a case in which there are two types of treatment assignment \(Z= \{0,1\}\); but treatment status \(D\) is three-tiered \(D = \{0,1,2\}\). For example, we have an encouragement design in which subjects are either incentivized to watch a non-political television show (\(Z=0\)), or a political program (\(Z=1\)). For simplicity, let there be three forms of treatment: subjects watch a non-political show (\(D=0\)), mayoral debate (\(D=1\)), or the news (\(D=2\)).
Question: How many types of subjects are there with two treatment assignments and three forms of actual treatment?
Answer: \(3^2 = 9\) types. More generally there are \((No. of Treatments)^{(No. of Assignments)}\) types. Here they are:
Type | D(Z=0) | D(Z=1) |
---|---|---|
1 | 0 | 0 |
2 | 0 | 1 |
3 | 1 | 0 |
4 | 1 | 1 |
5 | 2 | 0 |
6 | 0 | 2 |
7 | 1 | 2 |
8 | 2 | 1 |
9 | 2 | 2 |
Now consider a monotonicity stipulation: \(d_i(z=1) \geq d_i(z=0)\). What types are ruled out as a conseqence of this?
Type | D(Z=0) | D(Z=1) | Ruled Out |
---|---|---|---|
1 | 0 | 0 | - |
2 | 0 | 1 | - |
3 | 1 | 0 | X |
4 | 1 | 1 | - |
5 | 2 | 0 | X |
6 | 0 | 2 | - |
7 | 1 | 2 | - |
8 | 2 | 1 | X |
9 | 2 | 2 | - |
We will use ivreg from the AER package to compute the complier average causal effect. For this part, I use the Mullainathan, Washington, and Azari (2010) dataset. This is an experiment in which 1,000 subjects were randomly assigned to a treatment condition (encouragement to watch the mayoral debate), or a control condition in which they were encouraged to watch some non-political show. Note that we encounter two-sided non-compliance: some respondents would watch the debate irrespective of encouragement (“always takers”), and some would not watch these debates irrespective of their assignment status (“never takers”). It is reasonable to believe there are no “defiers”: respondents who would not watch the mayoral debate when encouraged to do so, and watch it when encouraged to watch non-political shows.
Let \(Z\) be the assignment status (\(Z=1\) if encouraged to watch the mayoral debate), \(D\) be the treatment (\(D=1\) if a respondent watches the debate), and \(Y\) be the change in their view of candidates.
# Download the dataset (http://hdl.handle.net/10079/kh189dd)
dat5 <- read.csv("W5_MayoralDebates.csv")
kable(head(dat5))
Z | D | Y |
---|---|---|
0 | 0 | 0 |
0 | 0 | 0 |
1 | 0 | 0 |
1 | 1 | 1 |
1 | 1 | 0 |
0 | 0 | 1 |
# Approach 1: Estimate CACE by separately computing ITT and ITT_D
model_itt <- tidy(lm_robust(Y ~ Z, data = dat5))
kable(model_itt)
coefficient_name | coefficients | se | p | ci_lower | ci_upper | df | outcome |
---|---|---|---|---|---|---|---|
(Intercept) | 0.4181818 | 0.0221928 | 0.0000000 | 0.3746318 | 0.4617318 | 998 | Y |
Z | 0.0570657 | 0.0314219 | 0.0696533 | -0.0045949 | 0.1187263 | 998 | Y |
model_ittd <- tidy(lm_robust(D ~ Z, data = dat5))
kable(model_ittd)
coefficient_name | coefficients | se | p | ci_lower | ci_upper | df | outcome |
---|---|---|---|---|---|---|---|
(Intercept) | 0.1616162 | 0.0165615 | 0 | 0.1291168 | 0.1941156 | 998 | D |
Z | 0.2047205 | 0.0271084 | 0 | 0.1515244 | 0.2579166 | 998 | D |
# CACE is a ratio of ITT/ITT_D
model_itt$coefficients[2]/model_ittd$coefficients[2]
## [1] 0.2787494
# Approach 2: Use ivreg to do 2SLS in one step
library(AER)
model_cace <- ivreg(Y ~ D | Z, data = dat5)
# Note that ivreg takes a formula of the type: Outcome ~ Endogenous
# Regressor + Covariates | Exogenous Instrument + Covariates
summary(model_cace)
##
## Call:
## ivreg(formula = Y ~ D | Z, data = dat5)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.6519 -0.3731 -0.3731 0.6269 0.6269
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.37313 0.04346 8.585 <2e-16 ***
## D 0.27875 0.15299 1.822 0.0688 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4952 on 998 degrees of freedom
## Multiple R-Squared: 0.00992, Adjusted R-squared: 0.008928
## Wald test: 3.32 on 1 and 998 DF, p-value: 0.06876
Note: The point estimate for the CACE is 0.2787, and the 95% confidence interval is \(\hat{CACE} \pm1.96 \cdot 0.15299\), that is: \([-0.0211, 0.5785]\).
Method 1: \(\hat{CACE} = \frac{\hat{ITT_Y}}{\hat{ITT_D}} = \frac{E[Y_i|Z=2] - E[Y_i|Z=0]}{E[D_i|Z=2] - E[D_i|Z=0]}\)
Method 2: \(\hat{CACE} = E[Y_i(Z=2,D=1)|C] - E[Y_i(Z=1,D=0)|C]\)
Note: (1) The difference-in-means estimator used in Method 2 is unbiased, while the ratio estimator used in Method 1 is biased. Both estimators are consistent. (2) Placebo designs are “over-identified” because the same data produce two estimates of the complier average causal effect.