From Last Week

Potential Outcomes

What is the difference between \(Y_i(0)|d_i=1\) and \(Y_i(0)|D_i=1\)?

\(Y_i(0)|d_i=1\) is the untreated potential outcome for subject \(i\) when they are actually treated. \(Y_i(0)|D_i=1\) is the untreated potential outcome for \(i\) if they hypothetically receive the treatment.

What is the difference between \(\mathbb{E}[Y_i(0)|d_i=1]\) and \(\mathbb{E}[Y_i(0)|D_i=1]\)?

\(\mathbb{E}[Y_i(0)|d_i=1]\) is the average value of untreated potential outcomes for subjects actually assigned to treatment. \(\mathbb{E}[Y_i(0)|D_i=1]\) is the average value of untreated potential outcomes for subjects hypothetically assigned to treatment.

Example

Form pairs, and calculate the following quantities using last week’s data set (provided below): \(\mathbb{E}[Y_i(1)|d_i=0]\) and \(\mathbb{E}[Y_i(1)|D_i=0]\)

library(tidyverse)
library(randomizr)
library(knitr)

data1 <- tibble(
         Unit = c(1:4),
         Y1 = c(10,8,6,4),
         Y0 = c(15,13,11,9),
         d_i= c(1,1,0,0))
declaration <- declare_ra(N = 4, m = 2)
D <- obtain_permutation_matrix(declaration) 

data1 <- as.data.frame(data1)
data1$d1 <- D[,1]
data1$d2 <- D[,2]
data1$d3 <- D[,3]
data1$d4 <- D[,4]
data1$d5 <- D[,5]

kable(data1, align = 'c', caption = "Full Schedule of Potential Outcomes and Treatment Assignments")

Full Schedule of Potential Outcomes and Treatment Assignments
Unit	Y1	Y0	d_i	d1	d2	d3	d4	d5
1	10	15	1	0	0	0	1	1
2	8	13	1	0	1	1	0	0
3	6	11	0	1	0	1	0	1
4	4	9	0	1	1	0	1	0

Note: The solutions are available in the last tab of these section notes.

Core Assumptions

The difference in means is an unbiased estimator of the average treatment effect under three assumptions: random assignment, excludability, and non-interference. In this section, I briefly describe each of these assumptions.

Random Assignment

Formally, we have: \(Y_i(1), Y_i(0),X \perp\!\!\!\perp D_i\)

What does this mean? To explain this, I add two rows to the potential outcomes table. These rows report the correlation between potential outcomes and an assignment vector: \(\rho(Y_i(1),d)\) and \(\rho(Y_i(0),d)\), \(\forall d\).

rho1 <- c("r(Y1,d)","","",round(cor(data1$Y1,data1$d_i),2),round(cor(data1$Y1,data1$d1),2),round(cor(data1$Y1,data1$d2),2), round(cor(data1$Y1,data1$d3),2),round(cor(data1$Y1,data1$d4),2),round(cor(data1$Y1,data1$d5),2))

rho2 <- c("r(Y0,d)","","",round(cor(data1$Y0,data1$d_i),2),round(cor(data1$Y0,data1$d1),2),round(cor(data1$Y0,data1$d2),2), round(cor(data1$Y0,data1$d3),2),round(cor(data1$Y0,data1$d4),2),round(cor(data1$Y0,data1$d5),2))

data1[5,] <- rho1
data1[6,] <- rho2

kable(data1, align = 'c', caption = "Full Schedule of Potential Outcomes and Treatment Assignments")

Full Schedule of Potential Outcomes and Treatment Assignments
Unit	Y1	Y0	d_i	d1	d2	d3	d4	d5
1	10	15	1	0	0	0	1	1
2	8	13	1	0	1	1	0	0
3	6	11	0	1	0	1	0	1
4	4	9	0	1	1	0	1	0
r(Y1,d)			0.89	-0.89	-0.45	0	0	0.45
r(Y0,d)			0.89	-0.89	-0.45	0	0	0.45

What is the average value of \(\rho(Y_i(1),d)\) and \(\rho(Y_i(0),d)\)?

Answer: Zero! Which is the implied property of random assignment.

Can you provide an intuition for why the correlation between any given assignment vector and treated/untreated potential outcomes is the same?

Answer: The mock data that I use assumes a constant treatment effect. So, \(Y_1 = Y_0 - 5\). This implies \(Var[Y_1] = Var[Y_0 - 5] = Var[Y_0]\), and that \(Cov(Y_1,d) = Cov(Y_0-5,d) = Cov(Y_0,d)\)

Finally, does \(\mathbb{E}[Y_i(1)] = \mathbb{E}[Y_i(1) | D_i = 0] = \mathbb{E}[Y_i(1) | D_i = 1]\)?

\(\mathbb{E}[Y_i(1)] = \frac{10+8+6+4}{4} = \frac{28}{4} = 7\)

\(\mathbb{E}[Y_i(1) | D_i = 0] = 7\) (see solutions)

\(\mathbb{E}[Y_i(1) | D_i = 1] = (\frac{1}{6})(\frac{10+8}{2}) + (\frac{1}{6})(\frac{6+4}{2}) + (\frac{1}{6})(\frac{8+4}{2}) + (\frac{1}{6})(\frac{8+6}{2}) + (\frac{1}{6})(\frac{10+4}{2}) + (\frac{1}{6})(\frac{10+6}{2})\)

Which equals: \(\frac{1}{6} \times \frac{84}{2} = 7\)

This is what random assignment suggests: \(E[Y_i(1) | D_i = 0] = E[Y_i(1) | D_i = 1] = E[Y_i(1)]\)

Note: Correlations tell us whether there is a linear association between two variables. By itself, it is not evidence of statistical independence because two variables might be non-linearly associated.

Excludability

Formally, excludability implies: \(Y_i(z_i,d_i) = Y_i(d_i)\) Potential outcomes respond only to the actual treatment status \(d_i\), not the treatment assignment status \(z_i\). In other words, \(Y_i(z=1,d)= Y_i(z=0,d)\)

As Gerber and Green note, “the exclusion restriction breaks down when random assignment sets in motion causes of \(Y_i\) other than the treatment \(d_i\).

Figure 1: Exclusion Restriction

Non-Interference

Formally, non-interference implies: \(Y_i(z,d) = Y_i(z',d')\) where \(z_i = z'_i\) and \(d_i=d'_i\). In other words, \(Y_i(z,d)\) is only affected by the treatment status and assignment of \(i\), not the treatment status or assignment of any other subject \(j\neq i\)

To illustrate, say we have \(n=3\) and \(m=1\). Let one assignment vector \(z=\{0,1,0\}\) in which person 3 (\(i=3\)) is untreated. If we assume non-interference, we in effect think that the third person’s potential outcome is the same when the assignment vector is \(z'=\{1,0,0\}\). In both \(z\) and \(z'\), \(i\)’s assignment status is the same (only the first and second person’s assignment status changes). However, \(i\)’s potential outcomes are not sensitive to changes in other people’s treatment or assignment status.

How is non-interference different from excludability? The non-interference assumption focuses on how interaction between units affects potential outcomes. The excludability assumption pertains to what we randomly assign and whether treatment is the only thing that moves with assignment.

Sampling Distributions

Last week we touched on the idea that the true ATE is a fixed, constant quantity, however, the difference-in-means estimate changes every time we randomly assign \(d_i\).

An unbiased estimator gives the correct answer or true value on average. Formally,\(\mathbb{E}[\widehat{\beta}] = \beta\) (where \(\widehat{\beta}\) is an estimate and \(\beta\) is the true population value). The difference-in-means is an unbiased estimator of the average treatment effect (i.e \(\mathbb{E}[\widehat{\text{DIM}}] = \text{ATE}\))

To illustrate this, I download the data from last week’s section and use a for loop to run simulations. The steps here are:

Use randomizr to obtain an assignment vector, \(d_i\)
Use the switching equation to obtain the observed values of the outcome variable: \(Y_i = d_i\cdot Y_i(1) + (1-d_i)\cdot Y_i(0)\).
Estimate the difference in means, i.e. the average value of the observed outcome in the treatment and control groups: \((\overline{Y_i | d_i=1}) - (\overline{Y_i | d_i=0})\).
Repeat this process a large number of times, drawing a new \(d_i\) each time.
A histogram of the resulting estimates will be centered on the true ATE.

# Load the data set
dat <- read_csv("w1_practice_data.csv")

# True ATE
true_ate = dat %>% summarise(`True ATE` = mean(Y1-Y0))

kable(true_ate)

True ATE
4.914945

# Simulations using a for-loop

estimates <- c()

for(i in 1:10000){
  dat <- dat %>%
  mutate(
    d = complete_ra(N = nrow(dat), m = 500),
    Y = Y1*d + (1-d)*Y0
  )
  estimate = dat %>% summarise(DIM = mean(Y[d==1]) - mean(Y[d==0]))
  estimates = bind_rows(estimates, estimate)
}

# estimates contains the difference-in-means for the observed outcome in each iteration of the simulation.
# For DIM to be an unbiased estimator, mean(estimates) = true_ate

estimates %>% summarise(`Average Estimate` = mean(DIM)) %>% kable(.)

Average Estimate
4.915983

# Lets visualize this using ggplot2

ggplot(estimates, aes(x = DIM)) + geom_histogram(bins = 50) + geom_vline(
  xintercept = mean(estimates$DIM),
  linetype = "dashed",
  color = "blue"
) + xlab("Estimates") + ylab("Frequency") + ggtitle("Sampling Distribution") + theme_bw()

ATT=ATC=ATE

Under random assignment, I have shown that: \(\mathbb{E}[Y_i(1) | D_i = 0] = \mathbb{E}[Y_i(1) | D_i = 1] = \mathbb{E}[Y_i(1)]\).

This is also true for untreated potential outcomes: \(\mathbb{E}[Y_i(0) | D_i = 0] = \mathbb{E}[Y_i(0) | D_i = 1] = \mathbb{E}[Y_i(0)]\)

It follows then that \(ATT = \mathbb{E}[Y_i(1) | D_i = 1] - \mathbb{E}[Y_i(0) | D_i = 1] = \mathbb{E}[Y_i(1)] - \mathbb{E}[Y_i(0)]\). Stated another way, \(\text{ATT} = \text{ATE}\).

Can you run a simulation using a for-loop to check if the ATT = ATE? Here are the steps:

Use randomizr to obtain an assignment vector, \(d_i\)
Calculate \(\text{mean}(Y_i(1) - Y_i(0))\) for all the subjects assigned to treatment, i.e. \(d_i=1\).
Repeat this process a large number of times, drawing a new \(d_i\) each time.
The average value of the resulting estimates is the true ATT. Does it equal the true ATE from above?

Note: The solution is available in the last tab of these section notes.

Multiple Treatment Conditions

How to randomly assign subjects to conditions

So far we have used randomizr to randomly assign units to one of two conditions using the complete_ra(N,m) command, where exactly m of N units are assigned to the treatment condition.

For a multi-arm experiment, we use the command complete_ra(N,m_each), where m_each is a vector that describes the number of subjects to be assigned to each condition. Note, \(sum(\text{m_each})=N\).

Generalized Switching equation

For a two-arm trial, the switching equation is: \(Y_i = d_i\cdot Y_i(1) + (1-d_i)\cdot Y_i(0)\). The generalized form of this uses a series of indicator variables, \(\mathbb{1}\). Specifically: \(Y_i = \mathbb{1}_{d_i=0} \cdot Y_i(0) + \mathbb{1}_{d_i=1} \cdot Y_i(1) + \mathbb{1}_{d_i=2} \cdot Y_i(2) + \cdots + \mathbb{1}_{d_i=k} \cdot Y_i(k)\)

We use the case_when command to operationalize this equation in R.

The code below shows how to assign subjects to four conditions. Exactly 150 are assigned to T1, 250 to T2, 400 to T3, and 200 to T4. We then use the generalized switching equation to create a new column of observed outcomes, \(Y_i\).

dat2 <- tibble(
  Y1 = rnorm(1000),
  Y2 = Y1 + rnorm(1000, mean = 5, sd = 1),
  Y3 = Y1 - rnorm(1000, mean = 5, sd = 1),
  Y4 = Y1*rnorm(1000, mean = 5, sd = 1)
)

dat2 <- dat2 %>%
  mutate(
    d_multi_arm = complete_ra(N = 1000, m_each = c(150,250,400,200)),
    y_obs = case_when(
      d_multi_arm == "T1" ~ Y1,
      d_multi_arm == "T2" ~ Y2,
      d_multi_arm == "T3" ~ Y3,
      d_multi_arm == "T4" ~ Y4
    )
  )

# Checking if treatment assignment is done correctly 

dat2 %>% group_by(d_multi_arm) %>% summarise(N = n()) %>% kable(.)

d_multi_arm	N
T1	150
T2	250
T3	400
T4	200

# Checking if the switching equation works correctly

dat2 %>% slice(1:10) %>% kable(.)

Y1	Y2	Y3	Y4	d_multi_arm	y_obs
0.1064488	5.228685	-4.867230	0.4492361	T1	0.1064488
0.8538338	6.323645	-5.147288	4.5906120	T1	0.8538338
-0.0705500	3.865580	-3.973157	-0.2763105	T2	3.8655803
0.0991486	5.422945	-5.526200	0.5037108	T4	0.5037108
1.3344020	6.698548	-2.953341	8.4103925	T3	-2.9533411
0.1582080	5.783114	-4.788443	0.7666135	T1	0.1582080
0.3402283	4.916483	-5.965033	2.2677492	T2	4.9164829
-2.0228642	2.585993	-8.601579	-9.0034495	T1	-2.0228642
0.8225177	4.111885	-4.340038	2.8112675	T2	4.1118855
0.1571970	5.240874	-5.599598	0.8448973	T3	-5.5995978

Solutions

Potential Outcomes

From the potential outcomes table,

\(\mathbb{E}[Y_i(1)|d_i=0] = \frac{6+4}{2} = 5\)

Gerber and Green say: “The notation \(\mathbb{E}[Y_i(1) | D_i = 0]\) may be regarded as shorthand for \(\mathbb{E}[\mathbb{E}[Y_i(1)|d_i = 0, \textbf{d}]]\), where d refers to a vector of treatment assignments and \(d_i\) refers its \(i\)th element. Given d, we may calculate the probability distribution function for all \(\{Y(1),d\}\) pairs and the expectation given this set of assignments. Then we may take the expectation of this expected value by summing over all possible d vectors.” (P28, Footnote 3)

\(\mathbb{E}[\mathbb{E}[Y_i(1)|d_i = 0, \textbf{d}]]\) = \((\frac{1}{6})(\frac{6+4}{2}) + (\frac{1}{6})(\frac{10+8}{2}) + (\frac{1}{6})(\frac{10+6}{2}) + (\frac{1}{6})(\frac{10+4}{2}) + (\frac{1}{6})(\frac{8+6}{2}) + (\frac{1}{6})(\frac{8+4}{2})\)

Which equals: \(\frac{1}{6} \times \frac{84}{2} = 7\)

ATT and ATE

For-loop to calculate the average treatment effect on the treated (ATT), given some treatment assignment vector.

library(tidyverse)
library(knitr)
library(randomizr)

# True ATE
true_ate = dat %>% summarise(`True ATE` = mean(Y1-Y0))
kable(true_ate)

True ATE
4.914945

# ATT calculation using a for-loop

estimates <- c()

for(i in 1:10000){
  dat <- dat %>%
  mutate(
    d = complete_ra(N = nrow(dat), m = 500),
  )
  estimate = dat %>% summarise(ATT = mean(Y1[d==1]) - mean(Y0[d==1]))
  estimates = bind_rows(estimates, estimate)
}

estimates %>% summarise(`True ATT` = mean(ATT)) %>% kable(.)

True ATT
4.915115

Section 2: Core Assumptions, Sampling Distributions, Multiple Treatments

Shikhar Singh

February 10, 2022

From Last Week

Potential Outcomes

Example

Core Assumptions

Random Assignment

Excludability

Non-Interference

Sampling Distributions

ATT=ATC=ATE

Multiple Treatment Conditions

How to randomly assign subjects to conditions

Generalized Switching equation

Solutions

Potential Outcomes

ATT and ATE