Notes from Week II

In Q.4, how do we get from the estimand \(\frac{\sum_1^N \tau_i d_i}{\sum_1^N d_i}\) to the ATT?

Intuition: The estimand changes depending on the random assignment. The first few steps in the process are:

\(\frac{\sum_1^N \tau_i d_i}{\sum_1^N d_i} = \frac{\sum_1^m \tau_i| d_i = 1}{m} = E[\tau_i | d_i = 1]\)

Note that the expectation operator in this step signifies an average over all units.

The question says that this estimand, in expectation, equals the ATE. Formally then:

\(E\{ E[\tau_i | d_i = 1] \} = E_d\{ E[\tau_i | d_i = 1, d] \} = E[\tau_i | D_i = 1]\)

The proof is straightforward hereafter, but it is important to note the expectation operator in the second step is an average over all possible random assignments (as described in footnote 3, p28).

How is non-intereference different from excludability?

Intuition: The non-interference assumption focuses on how interaction between units affects potential outcomes. The excludability assumption pertains to what we randomly assign and whether treatment is the only thing that moves with assignment. Notationally:

Non-interference implies \(Y_i(z,d) = Y_i(z',d')\) where \(z_i = z'_i\) and \(d_i=d'_i\). To illustrate, say we have \(n=3\) and \(m=1\). Let one assignment vector \(z = \{0,1,0\}\) in which \(i=3\) unit is untreated. If we assume non-interference, we in effect think that \(i\)’s potential outcome is the same when the assignment vector is \(z'= \{1,0,0\}\). Note that in both \(z\) and \(z'\), \(i\)’s assignment status is the same (only the first and second unit’s assignment status changes).

Excludability implies that \(Y_i(z,d) = Y_i(d)\), or \(Y_i(z=1,d) = Y_i(z=0,d)\). What this is saying is that whether we assign \(i\) to treatment or control does not matter; the only thing that determines which of \(i\)’s potential outcomes is realized is her/his treatment status.

Example: In question 10, non-interference is compromised if student \(i\) sits next to treated student \(j\) in the cafeteria, and reads her newspaper. Now, which potential outcome \(i\) reveals depends on \(j\)’s assignment and treatment status. In contrast, excludability is violated if being assigned to treatment (\(z=1\)) entails receiving a letter and newspaper. If the letter affects outcomes, it is no longer true that \(Y_i(z=1,d=1) = Y_i(z=0,d=1)\).

Robin hood treatments: How does \(Cov[Y_i(1),Y_i(0)]\) affect sampling variability?

The key intuition here is that when \(Y_i(0)\) and \(Y_i(1)\) positively covary, some units have both high treated and untreated potential outcomes, while others have low \(Y_i(0)\) and \(Y_i(1)\) values. As a result, when a unit \(i\) (say with high Y1 and Y0 values) moves between treatment conditions under different hypothetical assignments, \(i\) considerably changes group means. That is to say, when \(i\) is assigned to treatment, \(i\) “pushes-up” the treatment group mean and “pushes-down” the control group mean. This produces a large ATE estimate. On the other hand, when \(i\) is assigned to control, \(i\) “pushes-up” the control group mean and “pushes-down” the treatment group mean. This produces a small ATE estimate. As we can see, this extreme flip-flop increases the sampling variability. Here is an illustration of the fact:

Science Table: Cov[Y0,Y1] > 0
Unit Y0 Y1 d1 d2 d3
1 2 5 0 0 1
2 3 6 0 1 0
3 4 7 1 0 0
ATE Estimate - - 4.5 3 1.5
## [1] 1.5
Science Table: Cov[Y0,Y1] < 0
Unit Y0 Y1 d1 d2 d3
1 2 7 0 0 1
2 3 6 0 1 0
3 4 5 1 0 0
ATE Estimate - - 2.5 3 3.5
## [1] 0.1666667

In the above setup (\(n=3\), \(m=1\)) the main takeaway is that when \(Cov[Y_i(0),Y_i(1)] > 0\), I get three ATE estimates of 4.5, 3, and 1.5 (with \(Var[\hat{ATE}] = 1.5\)) . However, when \(Cov[Y_i(0),Y_i(1)] < 0\), I get three ATE estimates that are pretty close to each other: 2.5, 3, 3.5 (with \(Var[\hat{ATE}] = 0.167\))

Randomization Inference

In the following subsections, we will do randomization inference when there is (a) complete random assignment, (b) block random assignment, and (c) clustering. For the time being, we will use randomizr but in subsequent weeks move to ri2. The first step is to create some data with the following properties:

  • \(E[Y_i(1) - Y_i(0)] = 5\) but note \(\tau_i \neq 5\) \(\forall i\).

  • A blocking variable \(X_1\) that is highly correlated with outcomes.

  • A second blocking variable \(X_2\) that is uncorrelated with outcomes.

  • Homogenous clusters, so that inter-cluster variation is high.

  • Random assignment to clusters, so that intra-cluster variation is high.

library(dplyr)
library(randomizr)
library(knitr)

set.seed(100)

data <- data_frame(Y0 = rnorm(100, mean = 0, sd = 8), Y1 = Y0 + rnorm(100, mean = 5, 
    sd = 4), X1 = as.numeric(Y0 >= 0), X2 = randomizr::complete_ra(N = 100, 
    m = 50), C1 = ifelse(Y0 <= -1, 1, ifelse((Y0 > -1) & (Y0 <= 0), 2, ifelse((Y0 > 
    0) & (Y0 <= 1), 3, 4))), C2 = sample(x = c(1, 2, 3, 4), size = 100, replace = T, 
    prob = c(0.25, 0.25, 0.25, 0.25)))

kable(head(data), caption = "Dataset for Randomization Inference", row.names = T)
Dataset for Randomization Inference
Y0 Y1 X1 X2 C1 C2
1 -4.0175388 -0.3492322 0 1 1 2
2 1.0522493 11.5047042 1 0 4 4
3 -0.6313367 2.4920739 0 0 2 3
4 7.0942785 15.4657810 1 0 4 3
5 0.9357702 0.1037953 1 1 3 3
6 2.5490407 5.9478170 1 0 4 3
mean(data$Y1 - data$Y0)  # True ATE in this sample
## [1] 5.044563

Case 1: Complete Random Assignment

In this setup, exactly \(m\) of \(N\) units are assigned to treatment. We ignore covariate and cluster related information.

# Step 1: Create an assignment vector (i.e. the 'realized' or 'actual'
# assignment status)

declaration = declare_ra(N = nrow(data), m = 50)
declaration
## Random assignment procedure: Complete random assignment 
## Number of units: 100 
## Number of treatment arms: 2 
## The possible treatment categories are 0 and 1.
## The probabilities of assignment are constant across units.
Z <- conduct_ra(declaration)

# Step 2: Switching equation for observed values of Y
data$Y <- data$Y1 * Z + data$Y0 * (1 - Z)

# Step 2B: ATE estimate given Z
ate_estimate <- mean(data$Y[Z == 1]) - mean(data$Y[Z == 0])
ate_estimate
## [1] 3.956781
# Step 3: Obtaining a permutation matrix (all possible assignment vectors)
D <- obtain_permutation_matrix(declaration)

# Step 4: Obtain sampling distribution under the sharp null
ate1 <- rep(NA, ncol(D))  #ncol(D) is the number of columns in the permutation matrix

for (i in 1:ncol(D)) {
    Z <- D[, i]
    ate1[i] <- mean(data$Y[Z == 1]) - mean(data$Y[Z == 0])
}

# Step 5: For p values

mean(ate1 >= ate_estimate)  # one-sided
## [1] 0.0091
mean(abs(ate1) >= abs(ate_estimate))  # two-sided
## [1] 0.0197
# Step 6: To visualize this in terms of a histogram
hist(ate1, breaks = 100)
abline(v = ate_estimate, col = "blue")