Consider the Guan and Green (2006) canvassing experiment on the Peking University campus, and the information that leafleting inflated the treated potential outcomes by some constant (say \(0.02\)). Can we still estimate the CACE?

**Answer:** Yes, if we believe the claim, we can adjust for the exclusion restriction violation:

Let the true treatment group mean be:

\(E[Y_i | Z = 1] = E[Y_i(z=1,d=0) | NT]\pi_{NT} + E[Y_i(z=1,d=1) | C]\pi_{C}\)

Then what we observe is:

\(E[Y_i(z=1,d=0) \color{red}{+0.02} | NT]\pi_{NT} + E[Y_i(z=1,d=1) \color{red}{+0.02} | C]\pi_{C}\)

And the ITT is:

\(\hat{ITT} = E[Y|Z=1] - E[Y|Z=0]\)

\(= E[Y(0) + \color{red}{+0.02} | NT]\hat{\pi_{NT}} + E[Y(1) + \color{red}{+0.02} | C]\hat{\pi_{C}} - \{E[Y(0)| NT]\hat{\pi_{NT}} + E[Y(1)| C]\hat{\pi_{C}}\}\)

Then

\(\hat{ITT} = 0.02\cdot\hat{\pi_{NT}} + E[Y(1) - Y(0) | C]\hat{\pi_{C}} + 0.02\cdot{\hat{\pi_C}}\)

\(\hat{ITT} = 0.02\cdot(\hat{\pi_{NT}} + \hat{\pi_{C}}) + \hat{CACE}\hat{\pi_C}\)

Dividing both sides by \(\hat{\pi_C}\) or \(\hat{ITT_D}\) gives:

\(\frac{\hat{ITT}}{\hat{ITT_D}} = \frac{0.02}{\hat{ITT_D}} + \frac{\hat{CACE}\hat{\pi_C}}{\hat{\pi_C}} = \frac{0.02}{\hat{ITT_D}} + \hat{CACE}\)

**Note:** This solution hinges on the assertion that leafleting inflated outcomes in the treatment group by a known (and constant) amount. The credibility of that assertion must be independently established.

In Gerber and Green (2012), the Nickerson voter mobilization study is summarized in a table on p171. I want to demonstrate how we can recreate their dataset with only the information available in the table:

```
nickerson <- data.frame(Z = c(rep("baseline", 2572), rep("treatment", 486 +
2086), rep("placebo", 470 + 2109)), D = c(rep(0, 2572), rep(1, 486), rep(0,
2086), rep(1, 470), rep(0, 2109)), Y = c(rep(1, round(2572 * 0.3122)), rep(0,
round(2572 * (1 - 0.3122))), rep(1, round(486 * 0.3909)), rep(0, round(486 *
(1 - 0.3909))), rep(1, round(2086 * 0.3274)), rep(0, round(2086 * (1 - 0.3274))),
rep(1, round(470 * 0.2979)), rep(0, round(470 * (1 - 0.2979))), rep(1, round(2109 *
0.3215)), rep(0, round(2109 * (1 - 0.3215)))))
# Note: D=1 implies receiving the placebo treatment in the placebo group,
# and the actual treatment in the treatment group. Which is why, D=1
# separates 'compliers' from 'never takers' in both groups.
output <- nickerson %>% group_by(Z, D) %>% summarise(N = n(), Turnout = mean(Y))
kable(output, digits = 3, caption = "Nickerson (2005,2008) Summary Statistics")
```

Z | D | N | Turnout |
---|---|---|---|

baseline | 0 | 2572 | 0.312 |

placebo | 0 | 2109 | 0.321 |

placebo | 1 | 470 | 0.298 |

treatment | 0 | 2086 | 0.327 |

treatment | 1 | 486 | 0.391 |

Using the same Nickerson study dataset, I show we can compute the complier average causal effect in two different ways:

**Method 1 (Ratio Method)**

\(\hat{CACE} = \frac{\hat{ITT}}{\hat{ITT_D}}\) (familiar estimator, use `ivreg`)

**Method 2 (Difference-in-Means)**

Since a placebo design isolates compliers in “control” and “treatment” conditions, we can obtain the complier average causal effect by comparing the placebo and treatment group means:

\(\hat{CACE} = \overline{Y_{Treatment}} - \overline{Y_{Placebo}}\)

```
library(AER)
m1 <- nickerson %>% filter(Z == "treatment" | Z == "baseline") %>% mutate(Z_trt = as.numeric(Z ==
"treatment")) %>% with(ivreg(Y ~ D | Z_trt))
summary(m1)
```

```
##
## Call:
## ivreg(formula = Y ~ D | Z_trt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.4562 -0.3122 -0.3122 0.6878 0.6878
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.312208 0.009243 33.777 <2e-16 ***
## D 0.144033 0.069179 2.082 0.0374 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4688 on 5142 degrees of freedom
## Multiple R-Squared: -1.021e-05, Adjusted R-squared: -0.0002047
## Wald test: 4.335 on 1 and 5142 DF, p-value: 0.03739
```

```
# Estimate CACE using Method 2 (difference-in-means estimator)
m2 <- nickerson %>% filter(Z == "treatment" | Z == "placebo") %>% mutate(Z_trt = as.numeric(Z ==
"treatment")) %>% summarise(DIM = mean(Y[Z_trt == 1 & D == 1]) - mean(Y[Z_trt ==
0 & D == 1]))
m2
```

```
## DIM
## 1 0.09307416
```

What is the difference between \(E[Y_i(z=1, d(1)) | D_i =1]\) and \(E[Y_i(z=1, d(1)) | d_i(1) =1]\)?

**Example:** Consider a toy dataset (\(n=4\),\(m=2\)) in which subject types are known, and the observed assignment vector is \(Z\):

Unit | Y1 | Y0 | d(z=0) | d(z=1) | Z | Z1 | Z2 | Z3 | Z4 | Z5 |
---|---|---|---|---|---|---|---|---|---|---|

1 | 10 | 5 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 |

2 | 8 | 4 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 |

3 | 12 | 10 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |

4 | 3 | 2 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 |

E[Y(z=1,d(1))| d(1)=1] | - | - | - | - | 9 | 12 | 8 | 10 | 10 | 11 |

\(E[Y_i(z=1, d(1)) | d_i(1) =1] = \frac{10+8}{2} =9\)

\(E[Y_i(z=1, d(1)) | D_i =1] = \frac{1}{6}(9+ 12 + 8 + \frac{12+8}{2} + 10 + \frac{10+12}{2}) = \frac{60}{6} = 10\)

Consider the simplified version of the Milwaukee study presented in Table 6.7 (reproduced below). When monotonicity does **not** hold, we cannot identify types but we can place sharp bounds:

```
milwaukee <- data.frame(Arrest = c(798, 4, 802), Warning = c(2, 396, 398))
rownames(milwaukee) <- c("Arrested (D=1)", "Warned (D=0)", "N")
colnames(milwaukee) <- c("Arrest (Z=1)", "Warning (Z=0)")
kable(milwaukee)
```

Arrest (Z=1) | Warning (Z=0) | |
---|---|---|

Arrested (D=1) | 798 | 2 |

Warned (D=0) | 4 | 396 |

N | 802 | 398 |

Subjects assigned to treatment and treated: \(\pi_{C} + \pi_{AT} = \frac{798}{802} = 0.99501\)

Subjects assigned to treatment but go untreated: \(\pi_{NT} + \pi_{D} = \frac{4}{802} = 0.00499\)

Subjects assigned to control but treated: \(\pi_{AT} + \pi_{D} = \frac{2}{398} = 0.00503\)

Subjects assigned to control and go untreated: \(\pi_{NT} + \pi_{C} = \frac{396}{398} = 0.9949\)

**Step 1:** Fix the lower bound for Defiers, and Never Takers to 0. Then, the highest possible value for \(\pi_D\) and \(\pi_{NT}\) is \(0.00499\); and the highest possible value for \(\pi_{AT} = 0.00503\). Note we rule out the upper bound of 0.00503 for Defiers because if \(\pi_D = 0.00503\), then \(\pi_{D} + \pi_{NT} \neq 0.00499\). Assuming \(\pi_D\) takes on its highest value, then the lowest value for \(\pi_{AT} = 0.00503 - 0.00499 = 0.00004\) (or approximately 0).Finally, assuming \(\pi_{NT} = 0.00499\) then \(\pi_C = 0.9898\), and assuming \(\pi_{NT} = 0\) then \(\pi_C = 0.9949\).

How to read potential outcomes (Examples: \(E[Y_i(d(0)) | D_i = 1]\), \(E[Y_i(z=1,d=0) | d_i(1) > d_i(0)]\)). In particular, the difference between \(d_i\) and \(D_i\).

Core assumptions in an experiment (in words and notation), and the identifying assumptions for the Complier Average Causal Effect.

Estimating the proportion of always takers, never takers, and compliers using Table 6.2 (reproduced below), then computing the control and treatment group mean for compliers (and the other types).

Quantities | TreatmentGroup | ControlGroup |
---|---|---|

% Reporting Change (N Treated) | 59.5 (185) | 50.0 (80) |

% Reporting Change (N Untreated) | 40.6 (320) | 40.2 (415) |

% Reporting Change (Total N) | 47.5 (505) | 41.8 (495) |

Properties of the ratio estimator (\(\frac{ITT}{ITT_D}\)): (a) when \(ITT_D\) changes, so can the \(ITT\); (b) when compliance is low (\(ITT_D \rightarrow 0\)), bias from exclusion restriction violations can be large; (c) when there are defiers, the ratio estimator equals \(\frac{(ATE|Compliers)\pi_C - (ATE|Defiers)\pi_D}{\pi_C - \pi_D}\), and CACE is unidentified.

Properties of different estimators. Which of these are consistent and/or unbiased: difference-in-means, blocked ATE, difference-in-means with clustered data, difference-in-difference, covariate adjustment in OLS, ratio estimator? Can you prove that the difference-in-means estimator is unbiased, and the ratio estimator is biased?

Types of exclusion restriction violations: differential attrition in groups, measurement error, and bundled treatments.

Conditioning on post-treatment variables, and showing that the estimate is biased.

*Example:* Say there is an outcome \(Y\), assignment vector \(Z\) and binary covariate \(X\), that is measured post-treatment. How is the difference-in-means estimate conditional on \(X=1\) potentially biased away from the ATE for the subgroup for whom \(X_i(1) = X_i(0) = 1\)?

*Intuition:* Let there be four types of respondents:

X(Z=0) | X(Z=1) | Type |
---|---|---|

1 | 1 | A |

0 | 1 | B |

1 | 0 | C |

0 | 0 | D |

Call the subgroup that satisfy \(X_i(1) = X_i(0) = 1\) Type “A”. Then the ATE for this subgroup is:

\(E[Y_i | Z=1, Type A]\pi_{A} - E[Y_i | Z=0, Type A]\pi_A\)

However, when we condition on \(X_i=1\), we estimate

\(E[Y_i | Z=1] = \color{blue}{E[Y_i | Z=1, TypeA]\pi_A} + E[Y_i | Z=1, Type B]\pi_B\)

\(E[Y_i | Z=0] = \color{blue}{E[Y_i | Z=0, TypeA]\pi_A} + E[Y_i | Z=0, Type C]\pi_C\)

And for the difference between these quantities (\(E[Y_i | Z=1]-E[Y_i | Z=0]\)) to equal \(E[Y_i | Z=1, Type A]\pi_{A} - E[Y_i | Z=0, Type A]\pi_A\), it must be that:

\(\color{red}{E[Y_i | Z=1, Type B]\pi_B - E[Y_i | Z=0, Type C]\pi_C = 0}\)

Which, apart from chance, there is no reason to expect.

- How are “blocks” different from “clusters”? Is “covariate adjustment” the same thing as “blocking”?

Please submit

*three*questions on Canvas for Monday’s revision session. These questions are to be posted on the discussion thread “Questions and Clarifications” (accessible here).Next section will be held on Thursday, March 29. For that section, please select

*one*research paper that does an experiment of interest, and the authors have shared replication files. We will go around the room and each of you will describe the paper selected by you. This meant to help prepare for the final assignment (i.e. replication study).