The key intuition in Question 4 (Chapter III) is that when \(Y_i(0)\) and \(Y_i(1)\) positively covary, some units have both high treated and untreated potential outcomes, while others have low \(Y_i(0)\) and \(Y_i(1)\) values. As a result, when a unit \(i\) (say with high Y1 and Y0 values) moves between treatment conditions under different hypothetical assignments, \(i\) considerably changes group means. That is to say, when \(i\) is assigned to treatment, \(i\) “pushes-up” the treatment group mean and “pushes-down” the control group mean. This produces a large ATE estimate. On the other hand, when \(i\) is assigned to control, \(i\) “pushes-up” the control group mean and “pushes-down” the treatment group mean. This produces a small ATE estimate. As we can see, this extreme flip-flop increases the sampling variability. Here is an illustration of the fact:
Step 1: Recreate the data from Question 4.
Y0 | YA | YB |
---|---|---|
1 | 2 | 3 |
2 | 3 | 3 |
3 | 4 | 3 |
4 | 5 | 3 |
5 | 6 | 3 |
Step 2: Obtain all possible ways in which \(m=2\) of \(N=5\) subjects are assigned to treatment.
d1 | d2 | d3 | d4 | d5 | d6 | d7 | d8 | d9 | d10 |
---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 |
0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 |
Step 3: Estimate the difference in means for each assignment vector
Treatment Effect of A | Treatment Effect of B |
---|---|
3.5000000 | 1.0000000 |
2.6666667 | 0.6666667 |
1.8333333 | 0.3333333 |
1.8333333 | 0.3333333 |
1.0000000 | 0.0000000 |
0.1666667 | -0.3333333 |
1.0000000 | 0.0000000 |
0.1666667 | -0.3333333 |
-0.6666667 | -0.6666667 |
-1.5000000 | -1.0000000 |
Step 4: Calculate the standard deviation of the sampling distribution (i.e. the standard error)
Treatment A | Treatment B |
---|---|
2.083333 | 0.3333333 |
Step 5: Using equation 3.4 to get the same result
The true standard error of the average treatment effect is: \(\sqrt{\frac{1}{N-1}\{\frac{m}{N-m}Var[Y_i(0)] + \frac{N-m}{m}Var[Y_i(1)] + 2Cov[Y_i(0),Y_i(1)]\}}\).
Treatment B produces a smaller standard error because:
Treatment B minimizes \(Var[Y_i(1)]\) (variance in the treated potential outcomes) by reducing the achievement gap or heterogeneity in treated potential outcomes. In fact, \(Var[Y_i(B)] = 0\) while \(Var[Y_i(A)] = 2\).
Treatment B minimizes the covariance between treated and untreated potential outcomes. \(Cov(Y_0, Y_A) = 2\) and \(Cov(Y_0, Y_B) = 0\).
For this reason, a treatment that “bridges the gap” (i.e. it has a positive effect on subjects with low untreated potential outcomes and a negative effect on subjects with high untreated potential outcomes) will always have lower sampling variability.
Last week, I introduced a data set that contains potential outcomes (\(Y_1, Y_2\)) and two covariates (\(X_1,X_2\)). \(X_1\) is highly correlated with potential outcomes (\(rho \geq 0.7\)), while \(X_2\) is weakly correlated with potential outcomes (\(\rho \approx 0\)). In this section, I show that blocking on a predictive covariate reduces sampling variability.
dat2 <- tibble(
Y0 = rnorm(100, mean = 0, sd = 8),
Y1 = Y0 + rnorm(100, mean = 5, sd = 4),
X1 = as.numeric(Y0 >= 0),
X2 = randomizr::complete_ra(N = 100, m = 50),
C1 = ifelse(Y0 <= -1,1,ifelse((Y0 > -1) & (Y0 <= 0),2,ifelse((Y0 > 0) & (Y0 <= 1),3,4))),
C2 = sample(x = c(1,2,3,4), size = 100, replace = T, prob = c(0.25,0.25,0.25,0.25))
)
kable(head(dat2), caption = "Dataset from Last Week", row.names = T)
Y0 | Y1 | X1 | X2 | C1 | C2 | |
---|---|---|---|---|---|---|
1 | 5.842992 | 12.416623 | 1 | 1 | 4 | 1 |
2 | -6.271863 | -3.103697 | 0 | 0 | 1 | 2 |
3 | -4.143368 | 3.190110 | 0 | 0 | 1 | 2 |
4 | -25.630603 | -18.537212 | 0 | 1 | 1 | 3 |
5 | -10.073401 | -7.654605 | 0 | 0 | 1 | 1 |
6 | -5.134506 | -5.695581 | 0 | 1 | 1 | 2 |
dat2 %>% summarise(`True ATE` = mean(Y1 - Y0)) %>% kable()
True ATE |
---|
5.332846 |
Intuition: If subjects within a block have similar potential outcomes, randomizing within each block eliminates randomizations in which, by chance, all the subjects with high (or low) potential outcomes are assigned to the same treatment condition. This rules out extreme estimates that increase the spread of the sampling distribution.
Step 1: Lets specify two types of random assignments - complete random assignment (exactly \(m=50\) of \(N=100\) subjects are assigned to treatment), and blocked random assignment (exactly half of the subjects in each block, \(X_1=1\) and \(X_1=0\), are assigned to treatment).
Block Random Assignment: randomizr
allows you to specify block random assignment designs in three ways. As an illustrative case, imagine there are 2 blocks, A has 40 subjects and B has 60 subjects. If you specify m=20
, exactly 20 subjects in each will be assigned to the treatment condition. Note that this yields a treatment probability of \(\frac{1}{2}\) in block A and \(\frac{1}{3}\) in block B. A second option is to use block_m
. This option requires a numeric vector of length \(J\), where \(J\) is the number of blocks (2 in this case). So you can specify block_m
\(=\) c(20,30)
. This will assign exactly 20 of 40 subjects in Block A, and 30 of 60 subjects in Block B to the treatment condition. Note, the probability of being assigned to the treatment condition is equal in both blocks in this specification. Finally, you can use block_prob
. This requires a numeric vector of length \(J\) that specifies the probability of being assigned to the treatment condition in each block. Note that when you specify a probability, randomizr
is not conducting simple random assignment (independent trials with some probability \(p\)). randomizr
is taking the probability in block \(j\), \(p_j\) and multiplying it by the size of that block to get \(m\), then assigning exactly \(m\) of \(N\) units in block \(j\) to the treatment condition. So if you specify block_prob
\(=\) c(.5,.5)
, that is the same as block_m
\(=\) c(20,30)
.
complete_ra_declaration = declare_ra(N = nrow(dat2), m=50)
block_ra_declaration = declare_ra(N = nrow(dat2), blocks = dat2$X1, block_prob = c(.5,.5))
Step 2: Using simulations, I recover the sampling distribution of the average treatment effect under complete and block random assignment.
library(estimatr)
# Obtain sampling distribution under complete random assignment
D_complete_ra <- obtain_permutation_matrix(complete_ra_declaration)
ests_complete_ra <- c()
for(i in 1:ncol(D_complete_ra)){
dat_temp <- dat2 %>%
mutate(
d_temp = D_complete_ra[,i],
y_temp = d_temp*Y1 + (1-d_temp)*Y0
)
estimate = tidy(difference_in_means(y_temp ~ d_temp, data = dat_temp))
ests_complete_ra = bind_rows(ests_complete_ra, estimate)
}
# Obtain sampling distribution under blocked random assignment
D_block_ra <- obtain_permutation_matrix(block_ra_declaration)
ests_block_ra <- c()
for(i in 1:ncol(D_block_ra)){
dat_temp <- dat2 %>%
mutate(
d_temp = D_block_ra[,i],
y_temp = d_temp*Y1 + (1-d_temp)*Y0
)
estimate = tidy(difference_in_means(y_temp ~ d_temp, blocks = X1, data = dat_temp))
ests_block_ra = bind_rows(ests_block_ra, estimate)
}
# Make a figure to compare the two sampling distributions
library(patchwork)
fig1 <- ggplot(data = ests_complete_ra, aes(x = estimate)) +
geom_histogram(bins = 50) +
geom_vline(xintercept = mean(ests_complete_ra$estimate), linetype ="dashed", color ="red")+
ylab("Frequency") +
xlim(c(-1,12)) +
ggtitle("Complete Random Assignment") +
theme_bw()
fig2 <- ggplot(data = ests_block_ra, aes(x = estimate)) +
geom_histogram(bins = 50) +
geom_vline(xintercept = mean(ests_complete_ra$estimate), linetype ="dashed", color ="red") +
ylab("Frequency") +
xlim(c(-1,12)) +
ggtitle("Block Random Assignment") +
theme_bw()
fig1 / fig2
A confidence interval for \(\theta\), loosely speaking, is an estimated interval that covers the true value of \(\theta\) with at least a given probability. (Aronow and Miller 2019:124)
In other words, a 95% confidence interval, constructed through some procedure (say, based on Normal approximation), contains the true value of the parameter at least 95% of the time. Note that the Normal approximation-based confidence interval refers to an estimator that has this property (i.e. it contains the truth with 95% probability). This does not mean that the interval estimates we obtain from the data contain the true value of \(\theta\) 95% of the times.
To clarify this point, I plot the confidence intervals from the above simulation. In each iteration of that simulation, we conducted complete random assignment (\(m=50\)), apply the switching equation to potential outcomes to obtain observed outcomes, estimate the difference-in-means (\(\widehat{\theta}\)) and a 95% confidence interval for that \(\widehat{\theta}\). Our expectation is that at least 95% of these confidence intervals contain the true ATE (\(\theta = 5.231695\)).
true_ate = mean(dat2$Y1 -dat2$Y0)
ests_complete_ra <- ests_complete_ra %>%
mutate(
`Contains True ATE` = case_when(
(conf.low <= true_ate) & (true_ate <= conf.high) ~ "Yes",
TRUE ~ "No"
),
Iteration = row_number()
)
fig3 <- ggplot(
data = ests_complete_ra %>% filter(Iteration <= 100),
aes(x = estimate, y = Iteration, color = `Contains True ATE`)
) +
geom_point() +
geom_linerange(aes(xmin = conf.low, xmax = conf.high)) +
geom_vline(xintercept = true_ate,
linetype = "dashed",
color = "black") +
scale_color_manual(values = c("red", "gray")) +
theme_bw() +
theme(
panel.grid = element_blank()
)
fig3
For this week’s problem set, we have to load the Bertrand and Mullainathan (2004) data set and reproduce Table 1. In this section, we will work through some code that produces group means and difference in means within a block, and for the full sample.
Step 1: Estimate the treatment group means for each block.
Tidyverse Guide: We will use group_by
to split the data set into four subsets: untreated subjects in block \(X_1=0\), treated subjects in block \(X_1=0\), untreated subjects in block \(X_1=1\), and treated subjects in block \(X_1=1\). We then apply the summarize
function to calculate the average value of the outcome (mean(Y)
) in each subset, and the number of observations in each subset (n()
). Finally, we use the kable()
command from the knitr
package to make a table.
# Step 0: Conduct a block random assignment
dat2 <- dat2 %>%
mutate(
Z = conduct_ra(block_ra_declaration),
Y = Y1*Z + (1-Z)*Y0
)
# Step 1: Estimate the group means
means_1 <-
dat2 %>%
group_by(X1, Z) %>%
summarize(`Avg. Outcome` = mean(Y),
N = n())
kable(means_1,
caption = "Group Means",
caption.above = TRUE)
X1 | Z | Avg. Outcome | N |
---|---|---|---|
0 | 0 | -8.162283 | 27 |
0 | 1 | -2.333327 | 27 |
1 | 0 | 6.684450 | 23 |
1 | 1 | 11.177371 | 23 |
Step 2: Estimate the difference in means within each block.
Tidyverse Guide: We will use group_by
to split the data set into two subsets: subjects in block \(X_1=0\) and subjects in block \(X_1=1\). We then apply the summarize
function to run a regression within each block. We use the difference_in_means
command from the estimatr
package to get the difference in means estimate. You will note that data = cur_data()
. This tells difference_in_means
to use grouped data that is being piped from the previous step. Note also that we use tidy()
. This is because difference_in_means
returns a regression object that we can summary()
but this object is not a data frame. tidy()
creates a data frame that contains the regression results. As before, we use the kable()
command from the knitr
package to make a table.
dif_in_means_1 <-
dat2 %>%
group_by(X1) %>%
summarize(tidy(difference_in_means(Y ~ Z, data = cur_data())))
kable(dif_in_means_1,
digits = 3,
caption = "Difference in Means Within Blocks",
caption.above = TRUE)
X1 | term | estimate | std.error | statistic | p.value | conf.low | conf.high | df | outcome |
---|---|---|---|---|---|---|---|---|---|
0 | Z | 5.829 | 1.765 | 3.302 | 0.002 | 2.283 | 9.375 | 49.945 | Y |
1 | Z | 4.493 | 1.678 | 2.678 | 0.010 | 1.112 | 7.874 | 43.850 | Y |
Step 3: Adding the block size-weighted overall ATE estimate to the table.
Tidyverse Guide: We estimate the block-weighted overall ATE using difference_in_means
. Note that we specify blocks=X1
to get the block-weighted overall ATE. We use the kable()
command from the knitr
package to make a table. In making this table, we use mutate
and case_when
to add a column to the regression results data frame called “sample” that describes whether the difference in means estimate is for the full sample, or a particular block. We use bind_rows
to merge two data frames - one containing the block-level ATE estimates, another containing the block-weighted overall ATE estimate. Note that for bind_rows
to work, the two data frames must have exactly the same column names. For this reason, we use select
to specify which columns from the regression results data frame we would like to include in the table. For example, we include estimate
and std.error
but drop df
.
dif_in_means_2 <-
tidy(difference_in_means(Y ~ Z, blocks = X1, data = dat2)) %>%
mutate(
Sample = "Full Sample"
)
# Combining the block-level estimates with the overall ATE
dif_in_means_1 <- dif_in_means_1 %>%
mutate(
Sample = case_when(X1 == 0 ~ "Block X1=0",
X1 == 1 ~ "Block X1=1")
) %>%
select(-X1)
dif_in_means = bind_rows(dif_in_means_2, dif_in_means_1)
# Formatting to include only relevant information
dif_in_means <- dif_in_means %>%
select(estimate, std.error, p.value, conf.low, conf.high, Sample)
kable(dif_in_means,
digits = 3,
col.names = c("ATE Estimate", "SE", "p value", "CI (low)", "CI (high)", "Sample"))
ATE Estimate | SE | p value | CI (low) | CI (high) | Sample |
---|---|---|---|---|---|
5.214 | 1.226 | 0.000 | 2.780 | 7.649 | Full Sample |
5.829 | 1.765 | 0.002 | 2.283 | 9.375 | Block X1=0 |
4.493 | 1.678 | 0.010 | 1.112 | 7.874 | Block X1=1 |
Tidyverse Guide: We use the data frame from the previous step to make a coefficient plot using ggplot2
. A coefficient plot shows the ATE estimate and associated 95% confidence interval for each block, and the full sample. geom_point()
produces a dot which is the ATE estimate. geom_linerange
produces a confidence interval around this dot with a lower limit set by xmin
and upper limit set by xmax
. Finally, since we might be interested in knowing if the DIM estimate is statistically distinguishable from 0 (no effect), I plot a vertical line at 0 using geom_vline
.
ggplot(dif_in_means, aes(x = estimate, y = Sample)) +
geom_point() +
geom_linerange(aes(xmin = conf.low, xmax = conf.high)) +
geom_vline(xintercept = 0, linetype = "dashed", color = "red") +
theme_bw() +
theme(
panel.grid = element_blank()
)