What is the difference between \(Y_i(0)|d_i=1\) and \(Y_i(0)|D_i=1\)?
\(Y_i(0)|d_i=1\) is the untreated potential outcome for subject \(i\) when they are actually treated. \(Y_i(0)|D_i=1\) is the untreated potential outcome for \(i\) if they hypothetically receive the treatment.
What is the difference between \(\mathbb{E}[Y_i(0)|d_i=1]\) and \(\mathbb{E}[Y_i(0)|D_i=1]\)?
\(\mathbb{E}[Y_i(0)|d_i=1]\) is the average value of untreated potential outcomes for subjects actually assigned to treatment. \(\mathbb{E}[Y_i(0)|D_i=1]\) is the average value of untreated potential outcomes for subjects hypothetically assigned to treatment.
Form pairs, and calculate the following quantities using last week’s data set (provided below): \(\mathbb{E}[Y_i(1)|d_i=0]\) and \(\mathbb{E}[Y_i(1)|D_i=0]\)
library(tidyverse)
library(randomizr)
library(knitr)
data1 <- tibble(
Unit = c(1:4),
Y1 = c(10,8,6,4),
Y0 = c(15,13,11,9),
d_i= c(1,1,0,0))
declaration <- declare_ra(N = 4, m = 2)
D <- obtain_permutation_matrix(declaration)
data1 <- as.data.frame(data1)
data1$d1 <- D[,1]
data1$d2 <- D[,2]
data1$d3 <- D[,3]
data1$d4 <- D[,4]
data1$d5 <- D[,5]
kable(data1, align = 'c', caption = "Full Schedule of Potential Outcomes and Treatment Assignments")
Unit | Y1 | Y0 | d_i | d1 | d2 | d3 | d4 | d5 |
---|---|---|---|---|---|---|---|---|
1 | 10 | 15 | 1 | 0 | 0 | 0 | 1 | 1 |
2 | 8 | 13 | 1 | 0 | 1 | 1 | 0 | 0 |
3 | 6 | 11 | 0 | 1 | 0 | 1 | 0 | 1 |
4 | 4 | 9 | 0 | 1 | 1 | 0 | 1 | 0 |
Note: The solutions are available in the last tab of these section notes.
The difference in means is an unbiased estimator of the average treatment effect under three assumptions: random assignment, excludability, and non-interference. In this section, I briefly describe each of these assumptions.
Formally, we have: \(Y_i(1), Y_i(0),X \perp\!\!\!\perp D_i\)
What does this mean? To explain this, I add two rows to the potential outcomes table. These rows report the correlation between potential outcomes and an assignment vector: \(\rho(Y_i(1),d)\) and \(\rho(Y_i(0),d)\), \(\forall d\).
rho1 <- c("r(Y1,d)","","",round(cor(data1$Y1,data1$d_i),2),round(cor(data1$Y1,data1$d1),2),round(cor(data1$Y1,data1$d2),2), round(cor(data1$Y1,data1$d3),2),round(cor(data1$Y1,data1$d4),2),round(cor(data1$Y1,data1$d5),2))
rho2 <- c("r(Y0,d)","","",round(cor(data1$Y0,data1$d_i),2),round(cor(data1$Y0,data1$d1),2),round(cor(data1$Y0,data1$d2),2), round(cor(data1$Y0,data1$d3),2),round(cor(data1$Y0,data1$d4),2),round(cor(data1$Y0,data1$d5),2))
data1[5,] <- rho1
data1[6,] <- rho2
kable(data1, align = 'c', caption = "Full Schedule of Potential Outcomes and Treatment Assignments")
Unit | Y1 | Y0 | d_i | d1 | d2 | d3 | d4 | d5 |
---|---|---|---|---|---|---|---|---|
1 | 10 | 15 | 1 | 0 | 0 | 0 | 1 | 1 |
2 | 8 | 13 | 1 | 0 | 1 | 1 | 0 | 0 |
3 | 6 | 11 | 0 | 1 | 0 | 1 | 0 | 1 |
4 | 4 | 9 | 0 | 1 | 1 | 0 | 1 | 0 |
r(Y1,d) | 0.89 | -0.89 | -0.45 | 0 | 0 | 0.45 | ||
r(Y0,d) | 0.89 | -0.89 | -0.45 | 0 | 0 | 0.45 |
What is the average value of \(\rho(Y_i(1),d)\) and \(\rho(Y_i(0),d)\)?
Answer: Zero! Which is the implied property of random assignment.
Can you provide an intuition for why the correlation between any given assignment vector and treated/untreated potential outcomes is the same?
Answer: The mock data that I use assumes a constant treatment effect. So, \(Y_1 = Y_0 - 5\). This implies \(Var[Y_1] = Var[Y_0 - 5] = Var[Y_0]\), and that \(Cov(Y_1,d) = Cov(Y_0-5,d) = Cov(Y_0,d)\)
Finally, does \(\mathbb{E}[Y_i(1)] = \mathbb{E}[Y_i(1) | D_i = 0] = \mathbb{E}[Y_i(1) | D_i = 1]\)?
\(\mathbb{E}[Y_i(1)] = \frac{10+8+6+4}{4} = \frac{28}{4} = 7\)
\(\mathbb{E}[Y_i(1) | D_i = 0] = 7\) (see solutions)
\(\mathbb{E}[Y_i(1) | D_i = 1] = (\frac{1}{6})(\frac{10+8}{2}) + (\frac{1}{6})(\frac{6+4}{2}) + (\frac{1}{6})(\frac{8+4}{2}) + (\frac{1}{6})(\frac{8+6}{2}) + (\frac{1}{6})(\frac{10+4}{2}) + (\frac{1}{6})(\frac{10+6}{2})\)
Which equals: \(\frac{1}{6} \times \frac{84}{2} = 7\)
This is what random assignment suggests: \(E[Y_i(1) | D_i = 0] = E[Y_i(1) | D_i = 1] = E[Y_i(1)]\)
Note: Correlations tell us whether there is a linear association between two variables. By itself, it is not evidence of statistical independence because two variables might be non-linearly associated.
Formally, excludability implies: \(Y_i(z_i,d_i) = Y_i(d_i)\) Potential outcomes respond only to the actual treatment status \(d_i\), not the treatment assignment status \(z_i\). In other words, \(Y_i(z=1,d)= Y_i(z=0,d)\)
As Gerber and Green note, “the exclusion restriction breaks down when random assignment sets in motion causes of \(Y_i\) other than the treatment \(d_i\).
Formally, non-interference implies: \(Y_i(z,d) = Y_i(z',d')\) where \(z_i = z'_i\) and \(d_i=d'_i\). In other words, \(Y_i(z,d)\) is only affected by the treatment status and assignment of \(i\), not the treatment status or assignment of any other subject \(j\neq i\)
To illustrate, say we have \(n=3\) and \(m=1\). Let one assignment vector \(z=\{0,1,0\}\) in which person 3 (\(i=3\)) is untreated. If we assume non-interference, we in effect think that the third person’s potential outcome is the same when the assignment vector is \(z'=\{1,0,0\}\). In both \(z\) and \(z'\), \(i\)’s assignment status is the same (only the first and second person’s assignment status changes). However, \(i\)’s potential outcomes are not sensitive to changes in other people’s treatment or assignment status.
How is non-interference different from excludability? The non-interference assumption focuses on how interaction between units affects potential outcomes. The excludability assumption pertains to what we randomly assign and whether treatment is the only thing that moves with assignment.
Last week we touched on the idea that the true ATE is a fixed, constant quantity, however, the difference-in-means estimate changes every time we randomly assign \(d_i\).
An unbiased estimator gives the correct answer or true value on average. Formally,\(\mathbb{E}[\widehat{\beta}] = \beta\) (where \(\widehat{\beta}\) is an estimate and \(\beta\) is the true population value). The difference-in-means is an unbiased estimator of the average treatment effect (i.e \(\mathbb{E}[\widehat{\text{DIM}}] = \text{ATE}\))
To illustrate this, I download the data from last week’s section and use a for loop to run simulations. The steps here are:
# Load the data set
dat <- read_csv("w1_practice_data.csv")
# True ATE
true_ate = dat %>% summarise(`True ATE` = mean(Y1-Y0))
kable(true_ate)
True ATE |
---|
4.914945 |
# Simulations using a for-loop
estimates <- c()
for(i in 1:10000){
dat <- dat %>%
mutate(
d = complete_ra(N = nrow(dat), m = 500),
Y = Y1*d + (1-d)*Y0
)
estimate = dat %>% summarise(DIM = mean(Y[d==1]) - mean(Y[d==0]))
estimates = bind_rows(estimates, estimate)
}
# estimates contains the difference-in-means for the observed outcome in each iteration of the simulation.
# For DIM to be an unbiased estimator, mean(estimates) = true_ate
estimates %>% summarise(`Average Estimate` = mean(DIM)) %>% kable(.)
Average Estimate |
---|
4.915983 |
# Lets visualize this using ggplot2
ggplot(estimates, aes(x = DIM)) + geom_histogram(bins = 50) + geom_vline(
xintercept = mean(estimates$DIM),
linetype = "dashed",
color = "blue"
) + xlab("Estimates") + ylab("Frequency") + ggtitle("Sampling Distribution") + theme_bw()
Under random assignment, I have shown that: \(\mathbb{E}[Y_i(1) | D_i = 0] = \mathbb{E}[Y_i(1) | D_i = 1] = \mathbb{E}[Y_i(1)]\).
This is also true for untreated potential outcomes: \(\mathbb{E}[Y_i(0) | D_i = 0] = \mathbb{E}[Y_i(0) | D_i = 1] = \mathbb{E}[Y_i(0)]\)
It follows then that \(ATT = \mathbb{E}[Y_i(1) | D_i = 1] - \mathbb{E}[Y_i(0) | D_i = 1] = \mathbb{E}[Y_i(1)] - \mathbb{E}[Y_i(0)]\). Stated another way, \(\text{ATT} = \text{ATE}\).
Can you run a simulation using a for-loop to check if the ATT = ATE? Here are the steps:
Note: The solution is available in the last tab of these section notes.
So far we have used randomizr to randomly assign units to one of two conditions using the complete_ra(N,m) command, where exactly m of N units are assigned to the treatment condition.
For a multi-arm experiment, we use the command complete_ra(N,m_each), where m_each is a vector that describes the number of subjects to be assigned to each condition. Note, \(sum(\text{m_each})=N\).
For a two-arm trial, the switching equation is: \(Y_i = d_i\cdot Y_i(1) + (1-d_i)\cdot Y_i(0)\). The generalized form of this uses a series of indicator variables, \(\mathbb{1}\). Specifically: \(Y_i = \mathbb{1}_{d_i=0} \cdot Y_i(0) + \mathbb{1}_{d_i=1} \cdot Y_i(1) + \mathbb{1}_{d_i=2} \cdot Y_i(2) + \cdots + \mathbb{1}_{d_i=k} \cdot Y_i(k)\)
We use the case_when command to operationalize this equation in R.
The code below shows how to assign subjects to four conditions. Exactly 150 are assigned to T1, 250 to T2, 400 to T3, and 200 to T4. We then use the generalized switching equation to create a new column of observed outcomes, \(Y_i\).
dat2 <- tibble(
Y1 = rnorm(1000),
Y2 = Y1 + rnorm(1000, mean = 5, sd = 1),
Y3 = Y1 - rnorm(1000, mean = 5, sd = 1),
Y4 = Y1*rnorm(1000, mean = 5, sd = 1)
)
dat2 <- dat2 %>%
mutate(
d_multi_arm = complete_ra(N = 1000, m_each = c(150,250,400,200)),
y_obs = case_when(
d_multi_arm == "T1" ~ Y1,
d_multi_arm == "T2" ~ Y2,
d_multi_arm == "T3" ~ Y3,
d_multi_arm == "T4" ~ Y4
)
)
# Checking if treatment assignment is done correctly
dat2 %>% group_by(d_multi_arm) %>% summarise(N = n()) %>% kable(.)
d_multi_arm | N |
---|---|
T1 | 150 |
T2 | 250 |
T3 | 400 |
T4 | 200 |
# Checking if the switching equation works correctly
dat2 %>% slice(1:10) %>% kable(.)
Y1 | Y2 | Y3 | Y4 | d_multi_arm | y_obs |
---|---|---|---|---|---|
0.1064488 | 5.228685 | -4.867230 | 0.4492361 | T1 | 0.1064488 |
0.8538338 | 6.323645 | -5.147288 | 4.5906120 | T1 | 0.8538338 |
-0.0705500 | 3.865580 | -3.973157 | -0.2763105 | T2 | 3.8655803 |
0.0991486 | 5.422945 | -5.526200 | 0.5037108 | T4 | 0.5037108 |
1.3344020 | 6.698548 | -2.953341 | 8.4103925 | T3 | -2.9533411 |
0.1582080 | 5.783114 | -4.788443 | 0.7666135 | T1 | 0.1582080 |
0.3402283 | 4.916483 | -5.965033 | 2.2677492 | T2 | 4.9164829 |
-2.0228642 | 2.585993 | -8.601579 | -9.0034495 | T1 | -2.0228642 |
0.8225177 | 4.111885 | -4.340038 | 2.8112675 | T2 | 4.1118855 |
0.1571970 | 5.240874 | -5.599598 | 0.8448973 | T3 | -5.5995978 |
From the potential outcomes table,
\(\mathbb{E}[Y_i(1)|d_i=0] = \frac{6+4}{2} = 5\)
Gerber and Green say: “The notation \(\mathbb{E}[Y_i(1) | D_i = 0]\) may be regarded as shorthand for \(\mathbb{E}[\mathbb{E}[Y_i(1)|d_i = 0, \textbf{d}]]\), where d refers to a vector of treatment assignments and \(d_i\) refers its \(i\)th element. Given d, we may calculate the probability distribution function for all \(\{Y(1),d\}\) pairs and the expectation given this set of assignments. Then we may take the expectation of this expected value by summing over all possible d vectors.” (P28, Footnote 3)
\(\mathbb{E}[\mathbb{E}[Y_i(1)|d_i = 0, \textbf{d}]]\) = \((\frac{1}{6})(\frac{6+4}{2}) + (\frac{1}{6})(\frac{10+8}{2}) + (\frac{1}{6})(\frac{10+6}{2}) + (\frac{1}{6})(\frac{10+4}{2}) + (\frac{1}{6})(\frac{8+6}{2}) + (\frac{1}{6})(\frac{8+4}{2})\)
Which equals: \(\frac{1}{6} \times \frac{84}{2} = 7\)
For-loop to calculate the average treatment effect on the treated (ATT), given some treatment assignment vector.
library(tidyverse)
library(knitr)
library(randomizr)
# True ATE
true_ate = dat %>% summarise(`True ATE` = mean(Y1-Y0))
kable(true_ate)
True ATE |
---|
4.914945 |
# ATT calculation using a for-loop
estimates <- c()
for(i in 1:10000){
dat <- dat %>%
mutate(
d = complete_ra(N = nrow(dat), m = 500),
)
estimate = dat %>% summarise(ATT = mean(Y1[d==1]) - mean(Y0[d==1]))
estimates = bind_rows(estimates, estimate)
}
estimates %>% summarise(`True ATT` = mean(ATT)) %>% kable(.)
True ATT |
---|
4.915115 |