Notes from Last Week

Robin Hood Treatments and Sampling Variability

The key intuition in Question 4 (Chapter III) is that when \(Y_i(0)\) and \(Y_i(1)\) positively covary, some units have both high treated and untreated potential outcomes, while others have low \(Y_i(0)\) and \(Y_i(1)\) values. As a result, when a unit \(i\) (say with high Y1 and Y0 values) moves between treatment conditions under different hypothetical assignments, \(i\) considerably changes group means. That is to say, when \(i\) is assigned to treatment, \(i\) “pushes-up” the treatment group mean and “pushes-down” the control group mean. This produces a large ATE estimate. On the other hand, when \(i\) is assigned to control, \(i\) “pushes-up” the control group mean and “pushes-down” the treatment group mean. This produces a small ATE estimate. As we can see, this extreme flip-flop increases the sampling variability. Here is an illustration of the fact:

Step 1: Recreate the data from Question 4.

Data from Question 4, Chapter III
Y0 YA YB
1 2 3
2 3 3
3 4 3
4 5 3
5 6 3

Step 2: Obtain all possible ways in which \(m=2\) of \(N=5\) subjects are assigned to treatment.

Permutation Matrix
d1 d2 d3 d4 d5 d6 d7 d8 d9 d10
0 0 0 0 0 0 1 1 1 1
0 0 0 1 1 1 0 0 0 1
0 1 1 0 0 1 0 0 1 0
1 0 1 0 1 0 0 1 0 0
1 1 0 1 0 0 1 0 0 0

Step 3: Estimate the difference in means for each assignment vector

Sampling Distribution of ATE
Treatment Effect of A Treatment Effect of B
3.5000000 1.0000000
2.6666667 0.6666667
1.8333333 0.3333333
1.8333333 0.3333333
1.0000000 0.0000000
0.1666667 -0.3333333
1.0000000 0.0000000
0.1666667 -0.3333333
-0.6666667 -0.6666667
-1.5000000 -1.0000000

Step 4: Calculate the standard deviation of the sampling distribution (i.e. the standard error)

True SE of ATE
Treatment A Treatment B
2.083333 0.3333333

Step 5: Using equation 3.4 to get the same result

The true standard error of the average treatment effect is: \(\sqrt{\frac{1}{N-1}\{\frac{m}{N-m}Var[Y_i(0)] + \frac{N-m}{m}Var[Y_i(1)] + 2Cov[Y_i(0),Y_i(1)]\}}\).

Treatment B produces a smaller standard error because:

  1. Treatment B minimizes \(Var[Y_i(1)]\) (variance in the treated potential outcomes) by reducing the achievement gap or heterogeneity in treated potential outcomes. In fact, \(Var[Y_i(B)] = 0\) while \(Var[Y_i(A)] = 2\).

  2. Treatment B minimizes the covariance between treated and untreated potential outcomes. \(Cov(Y_0, Y_A) = 2\) and \(Cov(Y_0, Y_B) = 0\).

For this reason, a treatment that “bridges the gap” (i.e. it has a positive effect on subjects with low untreated potential outcomes and a negative effect on subjects with high untreated potential outcomes) will always have lower sampling variability.


Blocking and Sampling Variability

Last week, I introduced a data set that contains potential outcomes (\(Y_1, Y_2\)) and two covariates (\(X_1,X_2\)). \(X_1\) is highly correlated with potential outcomes (\(rho \geq 0.7\)), while \(X_2\) is weakly correlated with potential outcomes (\(\rho \approx 0\)). In this section, I show that blocking on a predictive covariate reduces sampling variability.

dat2 <- tibble(
  Y0 = rnorm(100, mean = 0, sd = 8),
  Y1 = Y0 + rnorm(100, mean = 5, sd = 4),
  X1 = as.numeric(Y0 >= 0),
  X2 = randomizr::complete_ra(N = 100, m = 50),
  C1 = ifelse(Y0 <= -1,1,ifelse((Y0 > -1) & (Y0 <= 0),2,ifelse((Y0 > 0) & (Y0 <= 1),3,4))),
  C2 = sample(x = c(1,2,3,4), size = 100, replace = T, prob = c(0.25,0.25,0.25,0.25))
)

kable(head(dat2), caption = "Dataset from Last Week", row.names = T)
Dataset from Last Week
Y0 Y1 X1 X2 C1 C2
1 5.842992 12.416623 1 1 4 1
2 -6.271863 -3.103697 0 0 1 2
3 -4.143368 3.190110 0 0 1 2
4 -25.630603 -18.537212 0 1 1 3
5 -10.073401 -7.654605 0 0 1 1
6 -5.134506 -5.695581 0 1 1 2
dat2 %>% summarise(`True ATE` = mean(Y1 - Y0)) %>% kable()
True ATE
5.332846

Intuition: If subjects within a block have similar potential outcomes, randomizing within each block eliminates randomizations in which, by chance, all the subjects with high (or low) potential outcomes are assigned to the same treatment condition. This rules out extreme estimates that increase the spread of the sampling distribution.

Complete v. Blocked Random Assignment

Step 1: Lets specify two types of random assignments - complete random assignment (exactly \(m=50\) of \(N=100\) subjects are assigned to treatment), and blocked random assignment (exactly half of the subjects in each block, \(X_1=1\) and \(X_1=0\), are assigned to treatment).

Block Random Assignment: randomizr allows you to specify block random assignment designs in three ways. As an illustrative case, imagine there are 2 blocks, A has 40 subjects and B has 60 subjects. If you specify m=20, exactly 20 subjects in each will be assigned to the treatment condition. Note that this yields a treatment probability of \(\frac{1}{2}\) in block A and \(\frac{1}{3}\) in block B. A second option is to use block_m. This option requires a numeric vector of length \(J\), where \(J\) is the number of blocks (2 in this case). So you can specify block_m \(=\) c(20,30). This will assign exactly 20 of 40 subjects in Block A, and 30 of 60 subjects in Block B to the treatment condition. Note, the probability of being assigned to the treatment condition is equal in both blocks in this specification. Finally, you can use block_prob. This requires a numeric vector of length \(J\) that specifies the probability of being assigned to the treatment condition in each block. Note that when you specify a probability, randomizr is not conducting simple random assignment (independent trials with some probability \(p\)). randomizr is taking the probability in block \(j\), \(p_j\) and multiplying it by the size of that block to get \(m\), then assigning exactly \(m\) of \(N\) units in block \(j\) to the treatment condition. So if you specify block_prob \(=\) c(.5,.5), that is the same as block_m \(=\) c(20,30).

complete_ra_declaration = declare_ra(N = nrow(dat2), m=50)

block_ra_declaration = declare_ra(N = nrow(dat2), blocks = dat2$X1, block_prob = c(.5,.5))

Step 2: Using simulations, I recover the sampling distribution of the average treatment effect under complete and block random assignment.

library(estimatr)

# Obtain sampling distribution under complete random assignment

D_complete_ra <- obtain_permutation_matrix(complete_ra_declaration) 

ests_complete_ra <- c()

for(i in 1:ncol(D_complete_ra)){
  dat_temp <- dat2 %>%
    mutate(
      d_temp = D_complete_ra[,i],
      y_temp = d_temp*Y1 + (1-d_temp)*Y0
    )
  estimate = tidy(difference_in_means(y_temp ~ d_temp, data = dat_temp))
  ests_complete_ra = bind_rows(ests_complete_ra, estimate)
}

# Obtain sampling distribution under blocked random assignment

D_block_ra <- obtain_permutation_matrix(block_ra_declaration) 

ests_block_ra <- c()

for(i in 1:ncol(D_block_ra)){
  dat_temp <- dat2 %>%
    mutate(
      d_temp = D_block_ra[,i],
      y_temp = d_temp*Y1 + (1-d_temp)*Y0
    )
  estimate = tidy(difference_in_means(y_temp ~ d_temp, blocks = X1, data = dat_temp))
  ests_block_ra = bind_rows(ests_block_ra, estimate)
}

# Make a figure to compare the two sampling distributions

library(patchwork)

fig1 <- ggplot(data = ests_complete_ra, aes(x = estimate)) + 
  geom_histogram(bins = 50) +
  geom_vline(xintercept = mean(ests_complete_ra$estimate), linetype ="dashed", color ="red")+
  ylab("Frequency") +
  xlim(c(-1,12)) +
  ggtitle("Complete Random Assignment") +
  theme_bw()

fig2 <- ggplot(data = ests_block_ra, aes(x = estimate)) + 
  geom_histogram(bins = 50) +
  geom_vline(xintercept = mean(ests_complete_ra$estimate), linetype ="dashed", color ="red") +
  ylab("Frequency") +
  xlim(c(-1,12)) +
  ggtitle("Block Random Assignment") +
  theme_bw()


fig1 / fig2


Confidence Intervals

A confidence interval for \(\theta\), loosely speaking, is an estimated interval that covers the true value of \(\theta\) with at least a given probability. (Aronow and Miller 2019:124)

In other words, a 95% confidence interval, constructed through some procedure (say, based on Normal approximation), contains the true value of the parameter at least 95% of the time. Note that the Normal approximation-based confidence interval refers to an estimator that has this property (i.e. it contains the truth with 95% probability). This does not mean that the interval estimates we obtain from the data contain the true value of \(\theta\) 95% of the times.

To clarify this point, I plot the confidence intervals from the above simulation. In each iteration of that simulation, we conducted complete random assignment (\(m=50\)), apply the switching equation to potential outcomes to obtain observed outcomes, estimate the difference-in-means (\(\widehat{\theta}\)) and a 95% confidence interval for that \(\widehat{\theta}\). Our expectation is that at least 95% of these confidence intervals contain the true ATE (\(\theta = 5.231695\)).

true_ate = mean(dat2$Y1 -dat2$Y0)

ests_complete_ra <- ests_complete_ra %>%
  mutate(
    `Contains True ATE` = case_when(
      (conf.low <= true_ate) & (true_ate <= conf.high) ~ "Yes",
      TRUE ~ "No"
    ),
    Iteration = row_number()
  )

fig3 <- ggplot(
  data = ests_complete_ra %>% filter(Iteration <= 100),
  aes(x = estimate, y = Iteration, color = `Contains True ATE`)
) +
  geom_point() +
  geom_linerange(aes(xmin = conf.low, xmax = conf.high)) +
  geom_vline(xintercept = true_ate,
             linetype = "dashed",
             color = "black") +
  scale_color_manual(values = c("red", "gray")) +
  theme_bw() +
  theme(
    panel.grid = element_blank()
  )

fig3


Making Tables

For this week’s problem set, we have to load the Bertrand and Mullainathan (2004) data set and reproduce Table 1. In this section, we will work through some code that produces group means and difference in means within a block, and for the full sample.

Step 1: Estimate the treatment group means for each block.

Tidyverse Guide: We will use group_by to split the data set into four subsets: untreated subjects in block \(X_1=0\), treated subjects in block \(X_1=0\), untreated subjects in block \(X_1=1\), and treated subjects in block \(X_1=1\). We then apply the summarize function to calculate the average value of the outcome (mean(Y)) in each subset, and the number of observations in each subset (n()). Finally, we use the kable() command from the knitr package to make a table.

# Step 0: Conduct a block random assignment 
dat2 <- dat2 %>%
  mutate(
    Z = conduct_ra(block_ra_declaration),
    Y = Y1*Z + (1-Z)*Y0
  )

# Step 1: Estimate the group means
means_1 <- 
  dat2 %>%
  group_by(X1, Z)  %>%
  summarize(`Avg. Outcome` = mean(Y),
            N = n())

kable(means_1,
      caption = "Group Means",
      caption.above = TRUE)
Group Means
X1 Z Avg. Outcome N
0 0 -8.162283 27
0 1 -2.333327 27
1 0 6.684450 23
1 1 11.177371 23

Step 2: Estimate the difference in means within each block.

Tidyverse Guide: We will use group_by to split the data set into two subsets: subjects in block \(X_1=0\) and subjects in block \(X_1=1\). We then apply the summarize function to run a regression within each block. We use the difference_in_means command from the estimatr package to get the difference in means estimate. You will note that data = cur_data(). This tells difference_in_means to use grouped data that is being piped from the previous step. Note also that we use tidy(). This is because difference_in_means returns a regression object that we can summary() but this object is not a data frame. tidy() creates a data frame that contains the regression results. As before, we use the kable() command from the knitr package to make a table.

dif_in_means_1 <-
  dat2 %>%
  group_by(X1) %>%
  summarize(tidy(difference_in_means(Y ~ Z, data = cur_data())))


kable(dif_in_means_1,
      digits = 3,
      caption = "Difference in Means Within Blocks",
      caption.above = TRUE)
Difference in Means Within Blocks
X1 term estimate std.error statistic p.value conf.low conf.high df outcome
0 Z 5.829 1.765 3.302 0.002 2.283 9.375 49.945 Y
1 Z 4.493 1.678 2.678 0.010 1.112 7.874 43.850 Y

Step 3: Adding the block size-weighted overall ATE estimate to the table.

Tidyverse Guide: We estimate the block-weighted overall ATE using difference_in_means. Note that we specify blocks=X1 to get the block-weighted overall ATE. We use the kable() command from the knitr package to make a table. In making this table, we use mutate and case_when to add a column to the regression results data frame called “sample” that describes whether the difference in means estimate is for the full sample, or a particular block. We use bind_rows to merge two data frames - one containing the block-level ATE estimates, another containing the block-weighted overall ATE estimate. Note that for bind_rows to work, the two data frames must have exactly the same column names. For this reason, we use select to specify which columns from the regression results data frame we would like to include in the table. For example, we include estimate and std.error but drop df.

dif_in_means_2 <-
  tidy(difference_in_means(Y ~ Z, blocks = X1, data = dat2)) %>%
  mutate(
    Sample = "Full Sample"
  )

# Combining the block-level estimates with the overall ATE

dif_in_means_1 <- dif_in_means_1 %>%
  mutate(
    Sample = case_when(X1 == 0 ~ "Block X1=0",
                       X1 == 1 ~ "Block X1=1") 
  ) %>%
  select(-X1)

dif_in_means = bind_rows(dif_in_means_2, dif_in_means_1)

# Formatting to include only relevant information

dif_in_means <- dif_in_means %>%
  select(estimate, std.error, p.value, conf.low, conf.high, Sample)

kable(dif_in_means,
      digits = 3,
      col.names = c("ATE Estimate", "SE", "p value", "CI (low)", "CI (high)", "Sample"))
ATE Estimate SE p value CI (low) CI (high) Sample
5.214 1.226 0.000 2.780 7.649 Full Sample
5.829 1.765 0.002 2.283 9.375 Block X1=0
4.493 1.678 0.010 1.112 7.874 Block X1=1

Coefficient Plots

Tidyverse Guide: We use the data frame from the previous step to make a coefficient plot using ggplot2. A coefficient plot shows the ATE estimate and associated 95% confidence interval for each block, and the full sample. geom_point() produces a dot which is the ATE estimate. geom_linerange produces a confidence interval around this dot with a lower limit set by xmin and upper limit set by xmax. Finally, since we might be interested in knowing if the DIM estimate is statistically distinguishable from 0 (no effect), I plot a vertical line at 0 using geom_vline.

ggplot(dif_in_means, aes(x = estimate, y = Sample)) +
  geom_point() +
  geom_linerange(aes(xmin = conf.low, xmax = conf.high)) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "red") +
  theme_bw() +
  theme(
    panel.grid = element_blank()
  )