Midterms have been graded, and comments posted on Canvas. I would be happy to discuss the solutions with you in office hours or at a mutually convenient time. Here are two helpful pieces of information: the mean score on the test was \(\mu = 89.73\) (\(11.07\)), and the median score \(90.5\).
Replication papers are due at the end of the semester. We will now go around the room, please share the following details with the class: the paper you have shortlisted for replication, the key variables (outcome and treatment), whether or not replication data and code are available, and any extensions or limitations in the analysis you might have come across.
Missing outcome data (or attrition) is a threat to unbiased inference. This is because attrition jeopardizes the core assumptions of an experiment:
and the control group mean is not an unbiased estimator of the population mean of \(Y_i(0)\):
Let there be four types of subjects in an experiment:
R(Z=1) | R(Z=0) | |
---|---|---|
Always Responder | 1 | 1 |
If-Treated Responder | 1 | 0 |
If-Control Responder | 0 | 1 |
Never Responder | 0 | 0 |
Then the Average Treatment Effect can be written as a weighted average of types:
\(ATE = E[Y_i(1)] - E[Y_i(0)] = E[Y_i | D_i =1] - E[Y_i | D_i = 0] = E[Y_i | Z_i =1] - E[Y_i | Z_i = 0]\)
where the first equality follows from the random assignment assumption (\(Y_i \perp Z_i\)), and the second from full compliance (\(d_i(1) > d_i(0) \forall i\)).
Focus on the treatment group:
\(E[Y_i | Z_i =1] = E[Y_i | Z_i = 1, \color{red}{R_i = 1}]\{E[R_i = 1| Z_i = 1]\} + E[Y_i | Z_i = 1, \color{red}{R_i = 0}] \{1 - E[R_i = 1| Z_i = 1]\}\)
Now say we condition on response \(R_i = 1\):
\(E[Y_i | Z_i = 1, R_i = 1] = E[Y_i | AR]\pi_{AR} + E[Y_i | ITR]\pi_{ITR}\)
Now lets look at the control group:
\(E[Y_i | Z_i =0] = E[Y_i | Z_i = 0, \color{red}{R_i = 1}]\{E[R_i = 1| Z_i = 0]\} + E[Y_i | Z_i = 0, \color{red}{R_i = 0}] \{1 - E[R_i = 1| Z_i = 0]\}\)
And if we condition on response \(R_i = 1\):
\(E[Y_i | Z_i = 0, R_i = 1] = E[Y_i | AR]\pi_{AR} + E[Y_i | ICR]\pi_{ICR}\)
In other words, when we condition on response (i.e. estimate \(E[Y_i | Z_i = 1, R_i = 1] - E[Y_i | Z_i = 0, R_i = 1]\)) we are effectively comparing “Always Responders” and “If-Treated Responders” in the treatment group, to “Always Responders” and “If-Control Responders” in the control group.
If we assume \(Y_i(z) \perp R_i(z)\) (i.e. missigness is independent of potential outcomes), then:
\(E[Y_i | Z_i = 1 , R_i = 1] = E[Y_i | Z_i = 1 , R_i = 0] = E[Y_i | Z_i = 1]\)
(observed treated units are a random subset of the treatment group)
and
\(E[Y_i | Z_i = 0 , R_i = 1] = E[Y_i | Z_i = 0 , R_i = 0] = E[Y_i | Z_i = 0]\)
(observed untreated units are a random subset of the control group)
With this assumption, conditioning on response still gives the ATE:
\(E[Y_i | Z_i = 1, R_i = 1] - E[Y_i | Z_i = 0, R_i = 1] = E[Y_i | Z_i = 1] - E[Y_i | Z_i = 0] = ATE\)
“MIPO” is a strong assumption that implies missigness is at random. A weaker assumption “MIPO|X” states that missigness is random within (or conditional on) a certain covariate profile. Formally:
\({Y_i(z) \perp R_i(z)} | \textbf{X}_i = x\), \(\forall x\) (where \(\textbf{X}_i\) is a vector of covariates)
Intuition: Say for some outcome \(Y\), women generally have higher values than men, or \(\mu_{Female} > \mu_{Male}\). And outcome data are missing for both men and women, but more so for men (\(E[R_i=1 | Male] = 0.4\) and \(E[R_i =1 |Female] = 0.8\)). Then, by construction, missigness is correlated with potential outcomes: attrition decreases as Y increases. So “MIPO” is an untenable assumption. But “MIPO|X” may be tenable, if conditional on being female (or male) outcome data are missing at random.
Proof: I will demonstrate that in the treatment group, within any covariate strata \(E[\frac{Y_i(1) \cdot R_i}{p_R(X_i)}|X_i] = E[Y_i(1) | X_i]\). Put another way, the IPW estimator equals the conditional mean of \(Y_i(1)\). And if we average over all values of the covariate, the law of iterated expectations tells us: \(E_X[E[Y_i(1)|X_i]] = E[Y_i(1)]\).
To see this, start with the IPW estimator:
\(E[\frac{Y_i(1) \cdot R_i}{p_R(X_i)}|X_i] = E[Y_i(1) \cdot \frac{R_i}{Pr[R_i=1 | X_i]} |X_i]\)
MIPO|X implies that \(Y_i(1) \perp R_i |X_i\), so we can re-write the above as:
\(E[Y_i(1) \cdot \frac{R_i}{Pr[R_i=1 | X_i]} |X_i] = E[Y_i(1) | X_i] \cdot E[\frac{R_i}{Pr[R_i=1 | X_i]} |X_i]\)
\(= E[Y_i(1) | X_i] \cdot \frac{E[R_i | X_i]}{Pr[R_i=1 | X_i]} = E[Y_i(1) | X_i] \cdot \frac{Pr[R_i=1 | X_i]}{Pr[R_i=1 | X_i]}\)
Which implies: \(E[\frac{Y_i(1) \cdot R_i}{p_R(X_i)}|X_i] = E[Y_i(1)|X_i]\)
And averaging over all possible values of \(X\) gives: \(E_X[E[\frac{Y_i(1) \cdot R_i}{p_R(X_i)}|X_i]] = E_X[E[Y_i(1)|X_i]] = E[Y_i(1)]\)
Similarly, in the control group: \(E_X[E[\frac{Y_i(0) \cdot R_i}{p_R(X_i)}|X_i]] = E_X[E[Y_i(0)|X_i]] = E[Y_i(0)]\)
This demonstrates that:
\(E_X[E[\frac{Y_i(1) \cdot R_i}{p_R(X_i)}|X_i]] - E_X[E[\frac{Y_i(0) \cdot R_i}{p_R(X_i)}|X_i]] = E[Y_i(1)] - E[Y_i(0)] = ATE\)
In this section, we will work with the data in Table 7.1 (Gerber and Green 2012:217). I will first estimate extreme value bounds that capture the fundamental uncertainty in the ATE estimate. However, the upper (or lower) bound is itself a random variable with a sampling distribution. In what follows, I will use a bootstrapping procedure and randomization inference to recover these sampling distributions.
Y(0) | Y(1) | r(0) | r(1) | Y(0) | r(0) | Y(1) | r(1) | Z | Y_obs | |
---|---|---|---|---|---|---|---|---|
1 | 3 | 8 | 0 | 1 | NA | 8 | 0 | NA |
2 | 3 | 7 | 0 | 1 | NA | 7 | 1 | 7 |
3 | 8 | 10 | 1 | 1 | 8 | 10 | 1 | 10 |
4 | 7 | 8 | 1 | 1 | 7 | 8 | 0 | 7 |
5 | 6 | 6 | 0 | 0 | NA | NA | 1 | NA |
6 | 5 | 8 | 1 | 1 | 5 | 8 | 0 | 5 |
7 | 6 | 6 | 1 | 0 | 6 | NA | 1 | NA |
8 | 6 | 7 | 1 | 1 | 6 | 7 | 0 | 6 |
model.matrix(Y_obs ~ Z, table7.1)
## (Intercept) Z
## 2 1 1
## 3 1 1
## 4 1 0
## 6 1 0
## 8 1 0
## attr(,"assign")
## [1] 0 1
# Note that despite no missing Z values, this matrix has only 5 rows
library(estimatr)
# Select only relevant columns from Table 7.1 (Y_obs and Z), and create a
# column for missingness
data <- table7.1 %>% select(Y_obs, Z) %>% mutate(missing = Y_obs - Y_obs)
# Code missingness as 1 if Y_obs is NA, else 0
data <- data %>% mutate(missing = recode(missing, .missing = 1))
# Impute values and create two new columns Y_upper and Y_lower (Alex's code)
data <- within(data, {
Y_upper <- Y_lower <- Y_obs
Y_upper[Z == 1 & missing == 1] <- 20
Y_upper[Z == 0 & missing == 1] <- 0
Y_lower[Z == 1 & missing == 1] <- 0
Y_lower[Z == 0 & missing == 1] <- 20
})
kable(data, caption = "Extreme Value Bounds Imputed Data")
Y_obs | Z | missing | Y_upper | Y_lower |
---|---|---|---|---|
NA | 0 | 1 | 0 | 20 |
7 | 1 | 0 | 7 | 7 |
10 | 1 | 0 | 10 | 10 |
7 | 0 | 0 | 7 | 7 |
NA | 1 | 1 | 20 | 0 |
5 | 0 | 0 | 5 | 5 |
NA | 1 | 1 | 20 | 0 |
6 | 0 | 0 | 6 | 6 |
# Lower bound
difference_in_means(Y_lower ~ Z, data)
## Design: Standard
## Estimate Std. Error Pr(>|t|) CI Lower CI Upper DF
## Z -5.25 4.337338 0.2760474 -16.13225 5.632249 5.442531
# Upper bound
difference_in_means(Y_upper ~ Z, data)
## Design: Standard
## Estimate Std. Error Pr(>|t|) CI Lower CI Upper DF
## Z 9.75 3.716517 0.05553307 -0.3620691 19.86207 4.217635
Note: From Table 7.1 we know that the true ATE, or mean(Y1 - Y0) = 2. In this step we estimated the fundamental uncertainty around that average treatment effect: \([-5.25,9.75]\). But observe that the lower (or upper bound) are sensitive to several things: values of \(Y_{min}\) and \(Y_{max}\), the number of missing observations, and sampling variability. For instance, in the text book \(Y_{min} = 0\) and \(Y_{max} = 10\), and the corresponding Manski bounds are \([4.75,-2.75]\). These are different from the bounds reported here, primarily because we impute with \(Y_{max} = 20\). However, both sets of bounds are also affected by sampling variability. That is to say, they would be different had we observed a different draw from the population, or a different assignment vector.
Draw a sample of \(n=8\) with replacement from data.
For each boostrap sample, estimate \(ATE_{lower}\) (lower bound) and \(ATE_{upper}\) (upper bound)
Repeat the process \(k = 10000\) times (essentially a large number of times), storing \(Y_{lower}\) and \(Y_{upper}\) in each iteration.
The standard deviation of the two vectors \(Y_{lower}\) and \(Y_{upper}\) is the uncertainty around each bound.
# Define output vectors
sims <- 10000
ate_lower <- rep(NA, sims)
ate_upper <- rep(NA, sims)
# Bootstrapping procedure
for (i in 1:sims) {
boot.sample = sample_n(data, size = 8, replace = T) # draws 8 rows with replacement from data
ate_lower[i] <- lm(Y_lower ~ Z, boot.sample)$coefficients[2]
ate_upper[i] <- lm(Y_upper ~ Z, boot.sample)$coefficients[2]
}
fig1.data <- data.frame(ATE = c(ate_lower, ate_upper), Type = c(rep("Lower",
sims), rep("Upper", sims)))
Type | Mean | SE |
---|---|---|
Lower | -5.171885 | 4.068366 |
Upper | 9.668185 | 3.491560 |
Note: When we sample with replacement from small datasets, there are some bootstrap samples in which all the rows are drawn from the treatment or control group. In those cases, the lower and upper bounds are undefined. For this reason, bootstrapping procedures should be performed on larger datasets.
Finally, we expect that the lower and upper bounds will change with the assignment vector \(Z\). In Table 7.1 we know every unit’s treated and untreated potential outcome, and their type. We can use this information to estimate the lower and upper bounds under different assignments:
Caution: We can measure the uncertainty due to random assignment only because Table 7.1 gives \(Y_i(1)\), \(Y_i(0)\) and the respondent’s type. When the full schedule of potential outcomes and respondent’s type are unknown, we do not know whether a respondent will reveal his outcome under some hypothetical assignment, and what the value of that outcome will be. Therefore, this simulation is only for illustrative purposes.
Estimand: ATE | AR
Assumption: Monotonicity, either \(r_i(1) \geq r_i(0)\) (no ICRs) or \(r_i(0) \geq r_i(1)\) (no ITRs)
Procedure: Estimate Q which is a measure of the “extra” observed subjects in the treatment group (when we assume there are no If-Control Responders).
\(Q = \frac{E[R_i = 1 | Z_i = 1] - E[R_i = 1 | Z = 0]}{E[R_i = 1 | Z_i = 1]} = \frac{(\pi_{AR} + \pi_{ITR}) - \pi_{AR}}{\pi_{AR} + \pi_{ITR}}\)
Then sort the data in descending order (highest to lowest values). To get the “upper bound” trim-off the lowest \(100q\%\) of the values of \(Y_i(1)\) and average the remaining values to get an upper estimate of \(E[Y_i(1) | AR]\). This process attributes the lowest \(Y_i(1)\) values to ITRs. Conversely, to get the “lower bound” trim-off the highest \(100q\%\) of \(Y_i(1)\) values, and average the remaining values to get a lower estimate of \(E[Y_i(1) | AR]\). This attributes the highest \(Y_i(1)\) values to ITRs.
Lets work through the intuition of trimming bounds by looking at a small dataset (\(n = 10, m = 6\)):
Y | Z | |
---|---|---|
1 | 10 | 1 |
2 | 8 | 1 |
3 | 5 | 1 |
4 | 2 | 1 |
5 | NA | 1 |
6 | NA | 1 |
7 | 5 | 0 |
8 | 5 | 0 |
9 | NA | 0 |
10 | NA | 0 |
Step 1: Estimate \(Q\)
# Create two variables 'observed' and 'missing'
trimming.data <- trimming.data %>% mutate(missing = Y - Y) %>% mutate(missing = recode(missing,
.missing = 1), observed = 1 - missing)
# Estimating Q
Q <- with(trimming.data, (mean(observed[Z == 1]) - mean(observed[Z == 0]))/mean(observed[Z ==
1]))
Q # should be 0.25
## [1] 0.25
Step 2: Trimming
# Create two vectors: observed values in treatment and control group (i.e.
# drop missing data)
observed_treatment <- trimming.data %>% filter(Z == 1 & observed == 1)
observed_treatment
## Y Z missing observed
## 1 10 1 0 1
## 2 8 1 0 1
## 3 5 1 0 1
## 4 2 1 0 1
observed_control <- trimming.data %>% filter(Z == 0 & observed == 1)
observed_control
## Y Z missing observed
## 1 5 0 0 1
## 2 5 0 0 1
mean(observed_control$Y) # E[Y0 | AR]
## [1] 5
# Arrange the treated observations in descending order
observed_treatment <- observed_treatment %>% arrange(desc(Y))
# Identify the cutoffs for the lower and upper bounds
quantile(observed_treatment$Y, Q)
## 25%
## 4.25
quantile(observed_treatment$Y, (1 - Q))
## 75%
## 8.5
# Trimming-off the lowest and highest values
observed_treatment_high <- filter(observed_treatment, Y > quantile(Y, probs = Q))
observed_treatment_high
## Y Z missing observed
## 1 10 1 0 1
## 2 8 1 0 1
## 3 5 1 0 1
observed_treatment_low <- filter(observed_treatment, Y < quantile(Y, probs = (1 -
Q)))
observed_treatment_low
## Y Z missing observed
## 1 8 1 0 1
## 2 5 1 0 1
## 3 2 1 0 1
Step 3: Estimate lower and upper bound for \(E[Y_i(1) | AR]\)
# Upper bound
mean(observed_treatment_high$Y) - mean(observed_control$Y)
## [1] 2.666667
# Lower bound
mean(observed_treatment_low$Y) - mean(observed_control$Y)
## [1] 0
Download the Angrist, Bettinger, and Kremer (2006) dataset posted on Canvas or Github. The treatment in their study is \(vouch0\), and the outcome variable is \(read\). Since outcome data is missing for many units, there are two additional variables: \(observed\) (which equals 1 if the reading score is reported), and \(missing\) (which equals 1 if the reading score is not reported or “missing”). Calculate the trimming bounds for the ATE | AR under the assumption \(r_i(1) \geq r_i(0)\) (i.e. there are no “if-control responders”).
Solution
(Reproduced from Alex Coppock’s class slides)
# Load the dataset
load("/Users/shikharsingh/Dropbox/Yale/Teaching/PLSC512_FieldExperiments/W7_Trimming.Rda")
# Confirm that missigness is higher in the control group (it includes NRs +
# ITRs)
tab <- data %>% group_by(vouch0) %>% summarise(Missingness = mean(missing))
kable(tab, digits = 3)
vouch0 | Missingness |
---|---|
0 | 0.698 |
1 | 0.624 |
# Calculate Q
Q = with(data, (mean(observed[vouch0 == 1]) - mean(observed[vouch0 == 0]))/mean(observed[vouch0 ==
1]))
Q
## [1] 0.1956908
# Subsetting to only observed data
observed_treatment <- data %>% filter(vouch0 == 1 & observed == 1)
observed_control <- data %>% filter(vouch0 == 0 & observed == 1)
mean(observed_control$read) # E[Y0 | AR]
## [1] 46.92081
# Arrange the treated observations in descending order
observed_treatment <- observed_treatment %>% arrange(desc(read))
# Identify the cutoffs for the lower and upper bounds
quantile(observed_treatment$read, Q)
## 19.56908%
## 43
quantile(observed_treatment$read, (1 - Q))
## 80.43092%
## 52
# Trimming-off the lowest and highest values (from Alex's code)
observed_treatment_high <- filter(observed_treatment, read > quantile(read,
probs = Q))
observed_treatment_low <- filter(observed_treatment, read < quantile(read, probs = (1 -
Q)))
# Upper bound
mean(observed_treatment_high$read) - mean(observed_control$read)
## [1] 2.812247
# Lower bound
mean(observed_treatment_low$read) - mean(observed_control$read)
## [1] -1.463869