Notes from Week VI

Re-estimating CACE with Known Excludability Violation

Consider the Guan and Green (2006) canvassing experiment on the Peking University campus, and the information that leafleting inflated the treated potential outcomes by some constant (say \(0.02\)). Can we still estimate the CACE?

Answer: Yes, if we believe the claim, we can adjust for the exclusion restriction violation:

Let the true treatment group mean be:

\(E[Y_i | Z = 1] = E[Y_i(z=1,d=0) | NT]\pi_{NT} + E[Y_i(z=1,d=1) | C]\pi_{C}\)

Then what we observe is:

\(E[Y_i(z=1,d=0) \color{red}{+0.02} | NT]\pi_{NT} + E[Y_i(z=1,d=1) \color{red}{+0.02} | C]\pi_{C}\)

And the ITT is:

\(\hat{ITT} = E[Y|Z=1] - E[Y|Z=0]\)

\(= E[Y(0) + \color{red}{+0.02} | NT]\hat{\pi_{NT}} + E[Y(1) + \color{red}{+0.02} | C]\hat{\pi_{C}} - \{E[Y(0)| NT]\hat{\pi_{NT}} + E[Y(1)| C]\hat{\pi_{C}}\}\)

Then

\(\hat{ITT} = 0.02\cdot\hat{\pi_{NT}} + E[Y(1) - Y(0) | C]\hat{\pi_{C}} + 0.02\cdot{\hat{\pi_C}}\)

\(\hat{ITT} = 0.02\cdot(\hat{\pi_{NT}} + \hat{\pi_{C}}) + \hat{CACE}\hat{\pi_C}\)

Dividing both sides by \(\hat{\pi_C}\) or \(\hat{ITT_D}\) gives:

\(\frac{\hat{ITT}}{\hat{ITT_D}} = \frac{0.02}{\hat{ITT_D}} + \frac{\hat{CACE}\hat{\pi_C}}{\hat{\pi_C}} = \frac{0.02}{\hat{ITT_D}} + \hat{CACE}\)

Note: This solution hinges on the assertion that leafleting inflated outcomes in the treatment group by a known (and constant) amount. The credibility of that assertion must be independently established.

Recreate dataset from a table

In Gerber and Green (2012), the Nickerson voter mobilization study is summarized in a table on p171. I want to demonstrate how we can recreate their dataset with only the information available in the table:

nickerson <- data.frame(Z = c(rep("baseline", 2572), rep("treatment", 486 + 
    2086), rep("placebo", 470 + 2109)), D = c(rep(0, 2572), rep(1, 486), rep(0, 
    2086), rep(1, 470), rep(0, 2109)), Y = c(rep(1, round(2572 * 0.3122)), rep(0, 
    round(2572 * (1 - 0.3122))), rep(1, round(486 * 0.3909)), rep(0, round(486 * 
    (1 - 0.3909))), rep(1, round(2086 * 0.3274)), rep(0, round(2086 * (1 - 0.3274))), 
    rep(1, round(470 * 0.2979)), rep(0, round(470 * (1 - 0.2979))), rep(1, round(2109 * 
        0.3215)), rep(0, round(2109 * (1 - 0.3215)))))

# Note: D=1 implies receiving the placebo treatment in the placebo group,
# and the actual treatment in the treatment group. Which is why, D=1
# separates 'compliers' from 'never takers' in both groups.

output <- nickerson %>% group_by(Z, D) %>% summarise(N = n(), Turnout = mean(Y))
kable(output, digits = 3, caption = "Nickerson (2005,2008) Summary Statistics")
Nickerson (2005,2008) Summary Statistics
Z D N Turnout
baseline 0 2572 0.312
placebo 0 2109 0.321
placebo 1 470 0.298
treatment 0 2086 0.327
treatment 1 486 0.391

Overidentified Placebo Designs

Using the same Nickerson study dataset, I show we can compute the complier average causal effect in two different ways:

Method 1 (Ratio Method)

\(\hat{CACE} = \frac{\hat{ITT}}{\hat{ITT_D}}\) (familiar estimator, use ivreg)

Method 2 (Difference-in-Means)

Since a placebo design isolates compliers in “control” and “treatment” conditions, we can obtain the complier average causal effect by comparing the placebo and treatment group means:

\(\hat{CACE} = \overline{Y_{Treatment}} - \overline{Y_{Placebo}}\)

library(AER)
m1 <- nickerson %>% filter(Z == "treatment" | Z == "baseline") %>% mutate(Z_trt = as.numeric(Z == 
    "treatment")) %>% with(ivreg(Y ~ D | Z_trt))
summary(m1)
## 
## Call:
## ivreg(formula = Y ~ D | Z_trt)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4562 -0.3122 -0.3122  0.6878  0.6878 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.312208   0.009243  33.777   <2e-16 ***
## D           0.144033   0.069179   2.082   0.0374 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4688 on 5142 degrees of freedom
## Multiple R-Squared: -1.021e-05,  Adjusted R-squared: -0.0002047 
## Wald test: 4.335 on 1 and 5142 DF,  p-value: 0.03739
# Estimate CACE using Method 2 (difference-in-means estimator)

m2 <- nickerson %>% filter(Z == "treatment" | Z == "placebo") %>% mutate(Z_trt = as.numeric(Z == 
    "treatment")) %>% summarise(DIM = mean(Y[Z_trt == 1 & D == 1]) - mean(Y[Z_trt == 
    0 & D == 1]))
m2
##          DIM
## 1 0.09307416

Reading Complex Potential Outcomes

What is the difference between \(E[Y_i(z=1, d(1)) | D_i =1]\) and \(E[Y_i(z=1, d(1)) | d_i(1) =1]\)?

Example: Consider a toy dataset (\(n=4\),\(m=2\)) in which subject types are known, and the observed assignment vector is \(Z\):

Unit Y1 Y0 d(z=0) d(z=1) Z Z1 Z2 Z3 Z4 Z5
1 10 5 1 1 1 0 0 0 1 1
2 8 4 0 1 1 0 1 1 0 0
3 12 10 0 1 0 1 0 1 0 1
4 3 2 0 0 0 1 1 0 1 0
E[Y(z=1,d(1))| d(1)=1] - - - - 9 12 8 10 10 11

\(E[Y_i(z=1, d(1)) | d_i(1) =1] = \frac{10+8}{2} =9\)

\(E[Y_i(z=1, d(1)) | D_i =1] = \frac{1}{6}(9+ 12 + 8 + \frac{12+8}{2} + 10 + \frac{10+12}{2}) = \frac{60}{6} = 10\)

Monotonicity Violations and Sharp Bounds on Types

Consider the simplified version of the Milwaukee study presented in Table 6.7 (reproduced below). When monotonicity does not hold, we cannot identify types but we can place sharp bounds:

milwaukee <- data.frame(Arrest = c(798, 4, 802), Warning = c(2, 396, 398))
rownames(milwaukee) <- c("Arrested (D=1)", "Warned (D=0)", "N")
colnames(milwaukee) <- c("Arrest (Z=1)", "Warning (Z=0)")
kable(milwaukee)
Arrest (Z=1) Warning (Z=0)
Arrested (D=1) 798 2
Warned (D=0) 4 396
N 802 398

Subjects assigned to treatment and treated: \(\pi_{C} + \pi_{AT} = \frac{798}{802} = 0.99501\)

Subjects assigned to treatment but go untreated: \(\pi_{NT} + \pi_{D} = \frac{4}{802} = 0.00499\)

Subjects assigned to control but treated: \(\pi_{AT} + \pi_{D} = \frac{2}{398} = 0.00503\)

Subjects assigned to control and go untreated: \(\pi_{NT} + \pi_{C} = \frac{396}{398} = 0.9949\)

Step 1: Fix the lower bound for Defiers, and Never Takers to 0. Then, the highest possible value for \(\pi_D\) and \(\pi_{NT}\) is \(0.00499\); and the highest possible value for \(\pi_{AT} = 0.00503\). Note we rule out the upper bound of 0.00503 for Defiers because if \(\pi_D = 0.00503\), then \(\pi_{D} + \pi_{NT} \neq 0.00499\). Assuming \(\pi_D\) takes on its highest value, then the lowest value for \(\pi_{AT} = 0.00503 - 0.00499 = 0.00004\) (or approximately 0).Finally, assuming \(\pi_{NT} = 0.00499\) then \(\pi_C = 0.9898\), and assuming \(\pi_{NT} = 0\) then \(\pi_C = 0.9949\).

Important Concepts

  1. How to read potential outcomes (Examples: \(E[Y_i(d(0)) | D_i = 1]\), \(E[Y_i(z=1,d=0) | d_i(1) > d_i(0)]\)). In particular, the difference between \(d_i\) and \(D_i\).

  2. Core assumptions in an experiment (in words and notation), and the identifying assumptions for the Complier Average Causal Effect.

  3. Estimating the proportion of always takers, never takers, and compliers using Table 6.2 (reproduced below), then computing the control and treatment group mean for compliers (and the other types).

Quantities TreatmentGroup ControlGroup
% Reporting Change (N Treated) 59.5 (185) 50.0 (80)
% Reporting Change (N Untreated) 40.6 (320) 40.2 (415)
% Reporting Change (Total N) 47.5 (505) 41.8 (495)
  1. Properties of the ratio estimator (\(\frac{ITT}{ITT_D}\)): (a) when \(ITT_D\) changes, so can the \(ITT\); (b) when compliance is low (\(ITT_D \rightarrow 0\)), bias from exclusion restriction violations can be large; (c) when there are defiers, the ratio estimator equals \(\frac{(ATE|Compliers)\pi_C - (ATE|Defiers)\pi_D}{\pi_C - \pi_D}\), and CACE is unidentified.

  2. Properties of different estimators. Which of these are consistent and/or unbiased: difference-in-means, blocked ATE, difference-in-means with clustered data, difference-in-difference, covariate adjustment in OLS, ratio estimator? Can you prove that the difference-in-means estimator is unbiased, and the ratio estimator is biased?

  3. Types of exclusion restriction violations: differential attrition in groups, measurement error, and bundled treatments.

  4. Conditioning on post-treatment variables, and showing that the estimate is biased.

Example: Say there is an outcome \(Y\), assignment vector \(Z\) and binary covariate \(X\), that is measured post-treatment. How is the difference-in-means estimate conditional on \(X=1\) potentially biased away from the ATE for the subgroup for whom \(X_i(1) = X_i(0) = 1\)?

Intuition: Let there be four types of respondents:

X(Z=0) X(Z=1) Type
1 1 A
0 1 B
1 0 C
0 0 D

Call the subgroup that satisfy \(X_i(1) = X_i(0) = 1\) Type “A”. Then the ATE for this subgroup is:

\(E[Y_i | Z=1, Type A]\pi_{A} - E[Y_i | Z=0, Type A]\pi_A\)

However, when we condition on \(X_i=1\), we estimate

\(E[Y_i | Z=1] = \color{blue}{E[Y_i | Z=1, TypeA]\pi_A} + E[Y_i | Z=1, Type B]\pi_B\)

\(E[Y_i | Z=0] = \color{blue}{E[Y_i | Z=0, TypeA]\pi_A} + E[Y_i | Z=0, Type C]\pi_C\)

And for the difference between these quantities (\(E[Y_i | Z=1]-E[Y_i | Z=0]\)) to equal \(E[Y_i | Z=1, Type A]\pi_{A} - E[Y_i | Z=0, Type A]\pi_A\), it must be that:

\(\color{red}{E[Y_i | Z=1, Type B]\pi_B - E[Y_i | Z=0, Type C]\pi_C = 0}\)

Which, apart from chance, there is no reason to expect.

  1. How are “blocks” different from “clusters”? Is “covariate adjustment” the same thing as “blocking”?

Announcements

  1. Please submit three questions on Canvas for Monday’s revision session. These questions are to be posted on the discussion thread “Questions and Clarifications” (accessible here).

  2. Next section will be held on Thursday, March 29. For that section, please select one research paper that does an experiment of interest, and the authors have shared replication files. We will go around the room and each of you will describe the paper selected by you. This meant to help prepare for the final assignment (i.e. replication study).