**Intuition**: The estimand changes depending on the random assignment. The first few steps in the process are:

\(\frac{\sum_1^N \tau_i d_i}{\sum_1^N d_i} = \frac{\sum_1^m \tau_i| d_i = 1}{m} = E[\tau_i | d_i = 1]\)

Note that the expectation operator in this step signifies an *average over all units*.

The question says that this estimand, in expectation, equals the ATE. Formally then:

\(E\{ E[\tau_i | d_i = 1] \} = E_d\{ E[\tau_i | d_i = 1, d] \} = E[\tau_i | D_i = 1]\)

The proof is straightforward hereafter, but it is important to note the expectation operator in the second step is an *average over all possible random assignments* (as described in footnote 3, p28).

**Intuition**: The non-interference assumption focuses on how interaction between units affects potential outcomes. The excludability assumption pertains to *what we randomly assign* and *whether treatment is the only thing that moves with assignment*. Notationally:

Non-interference implies \(Y_i(z,d) = Y_i(z',d')\) where \(z_i = z'_i\) and \(d_i=d'_i\). To illustrate, say we have \(n=3\) and \(m=1\). Let one assignment vector \(z = \{0,1,0\}\) in which \(i=3\) unit is untreated. If we assume non-interference, we in effect think that \(i\)’s potential outcome is the same when the assignment vector is \(z'= \{1,0,0\}\). Note that in both \(z\) and \(z'\), \(i\)’s assignment status is the same (only the first and second unit’s assignment status changes).

Excludability implies that \(Y_i(z,d) = Y_i(d)\), or \(Y_i(z=1,d) = Y_i(z=0,d)\). What this is saying is that whether we assign \(i\) to treatment or control does not matter; the only thing that determines which of \(i\)’s potential outcomes is realized is her/his treatment status.

*Example:* In question 10, non-interference is compromised if student \(i\) sits next to treated student \(j\) in the cafeteria, and reads her newspaper. Now, which potential outcome \(i\) reveals depends on \(j\)’s assignment and treatment status. In contrast, excludability is violated if being assigned to treatment (\(z=1\)) entails receiving a letter and newspaper. If the letter affects outcomes, it is no longer true that \(Y_i(z=1,d=1) = Y_i(z=0,d=1)\).

The key intuition here is that when \(Y_i(0)\) and \(Y_i(1)\) positively covary, some units have both high treated and untreated potential outcomes, while others have low \(Y_i(0)\) and \(Y_i(1)\) values. As a result, when a unit \(i\) (say with high Y1 and Y0 values) moves between treatment conditions under different hypothetical assignments, \(i\) considerably changes group means. That is to say, when \(i\) is assigned to treatment, \(i\) “pushes-up” the treatment group mean *and* “pushes-down” the control group mean. This produces a large ATE estimate. On the other hand, when \(i\) is assigned to control, \(i\) “pushes-up” the control group mean *and* “pushes-down” the treatment group mean. This produces a small ATE estimate. As we can see, this extreme flip-flop increases the sampling variability. Here is an illustration of the fact:

Unit | Y0 | Y1 | d1 | d2 | d3 |
---|---|---|---|---|---|

1 | 2 | 5 | 0 | 0 | 1 |

2 | 3 | 6 | 0 | 1 | 0 |

3 | 4 | 7 | 1 | 0 | 0 |

ATE Estimate | - | - | 4.5 | 3 | 1.5 |

`## [1] 1.5`

Unit | Y0 | Y1 | d1 | d2 | d3 |
---|---|---|---|---|---|

1 | 2 | 7 | 0 | 0 | 1 |

2 | 3 | 6 | 0 | 1 | 0 |

3 | 4 | 5 | 1 | 0 | 0 |

ATE Estimate | - | - | 2.5 | 3 | 3.5 |

`## [1] 0.1666667`

In the above setup (\(n=3\), \(m=1\)) the main takeaway is that when \(Cov[Y_i(0),Y_i(1)] > 0\), I get three ATE estimates of 4.5, 3, and 1.5 (with \(Var[\hat{ATE}] = 1.5\)) . However, when \(Cov[Y_i(0),Y_i(1)] < 0\), I get three ATE estimates that are pretty close to each other: 2.5, 3, 3.5 (with \(Var[\hat{ATE}] = 0.167\))

In the following subsections, we will do randomization inference when there is (a) complete random assignment, (b) block random assignment, and (c) clustering. For the time being, we will use `randomizr` but in subsequent weeks move to `ri2`. The first step is to create some data with the following properties:

\(E[Y_i(1) - Y_i(0)] = 5\) but note \(\tau_i \neq 5\) \(\forall i\).

A blocking variable \(X_1\) that is highly correlated with outcomes.

A second blocking variable \(X_2\) that is uncorrelated with outcomes.

Homogenous clusters, so that inter-cluster variation is high.

Random assignment to clusters, so that intra-cluster variation is high.

```
library(dplyr)
library(randomizr)
library(knitr)
set.seed(100)
data <- data_frame(Y0 = rnorm(100, mean = 0, sd = 8), Y1 = Y0 + rnorm(100, mean = 5,
sd = 4), X1 = as.numeric(Y0 >= 0), X2 = randomizr::complete_ra(N = 100,
m = 50), C1 = ifelse(Y0 <= -1, 1, ifelse((Y0 > -1) & (Y0 <= 0), 2, ifelse((Y0 >
0) & (Y0 <= 1), 3, 4))), C2 = sample(x = c(1, 2, 3, 4), size = 100, replace = T,
prob = c(0.25, 0.25, 0.25, 0.25)))
kable(head(data), caption = "Dataset for Randomization Inference", row.names = T)
```

Y0 | Y1 | X1 | X2 | C1 | C2 | |
---|---|---|---|---|---|---|

1 | -4.0175388 | -0.3492322 | 0 | 1 | 1 | 2 |

2 | 1.0522493 | 11.5047042 | 1 | 0 | 4 | 4 |

3 | -0.6313367 | 2.4920739 | 0 | 0 | 2 | 3 |

4 | 7.0942785 | 15.4657810 | 1 | 0 | 4 | 3 |

5 | 0.9357702 | 0.1037953 | 1 | 1 | 3 | 3 |

6 | 2.5490407 | 5.9478170 | 1 | 0 | 4 | 3 |

`mean(data$Y1 - data$Y0) # True ATE in this sample`

`## [1] 5.044563`

In this setup, exactly \(m\) of \(N\) units are assigned to treatment. We ignore covariate and cluster related information.

```
# Step 1: Create an assignment vector (i.e. the 'realized' or 'actual'
# assignment status)
declaration = declare_ra(N = nrow(data), m = 50)
declaration
```

```
## Random assignment procedure: Complete random assignment
## Number of units: 100
## Number of treatment arms: 2
## The possible treatment categories are 0 and 1.
## The probabilities of assignment are constant across units.
```

```
Z <- conduct_ra(declaration)
# Step 2: Switching equation for observed values of Y
data$Y <- data$Y1 * Z + data$Y0 * (1 - Z)
# Step 2B: ATE estimate given Z
ate_estimate <- mean(data$Y[Z == 1]) - mean(data$Y[Z == 0])
ate_estimate
```

`## [1] 3.956781`

```
# Step 3: Obtaining a permutation matrix (all possible assignment vectors)
D <- obtain_permutation_matrix(declaration)
# Step 4: Obtain sampling distribution under the sharp null
ate1 <- rep(NA, ncol(D)) #ncol(D) is the number of columns in the permutation matrix
for (i in 1:ncol(D)) {
Z <- D[, i]
ate1[i] <- mean(data$Y[Z == 1]) - mean(data$Y[Z == 0])
}
# Step 5: For p values
mean(ate1 >= ate_estimate) # one-sided
```

`## [1] 0.0091`

`mean(abs(ate1) >= abs(ate_estimate)) # two-sided`

`## [1] 0.0197`

```
# Step 6: To visualize this in terms of a histogram
hist(ate1, breaks = 100)
abline(v = ate_estimate, col = "blue")
```