Notes from Week IX

Within-Subject Studies

Within-subject experiments track a single person or entity over a period of time, and random assignment determines when a treatment is administered (Gerber and Green 2012:273). Such designs typically make two non-interference assumptions: no anticipation ($$D_{i,t+1}$$ does not affect $$Y_{i,t}$$) and no persistence ($$D_{i,t-1}$$ does not affect $$Y_{i,t}$$). In this section, I use the “tetris” dataset to evaluate these assumptions (this corresponds to Question 10 (b) and (c) in the previous week’s problem set):

library(tidyverse)
library(randomizr)
library(estimatr)
library(knitr)

# We need to create two variables:
data$run_lag <- c(NA, data$run[1:25])
data$run_anticp <- c(data$run[2:26], NA)

# Estimate the effects:

effects <- data.frame(Bivariate = c(lm_robust(tetris ~ run, data = data)$coefficients[2], "(Coefficient from tetris ~ run )"), Persistence = c(summary(lm_robust(tetris ~ run + run_lag, data = data))$fstatistic[1], "(F Statistic from tetris ~ run + run_lag)"),
Anticipation = c(lm_robust(tetris ~ run_anticp, data = data)$coefficients[2], "(Coefficient from tetris ~ run_anticp)")) kable(effects, row.names = F, caption = "Estimates", digits = 2) Estimates Bivariate Persistence Anticipation 13613.1 4.5445923571162 645.621212121212 (Coefficient from tetris ~ run ) (F Statistic from tetris ~ run + run_lag) (Coefficient from tetris ~ run_anticp) # Conduct randomization inference: ## Note that every time we generate a new assignment vector ('run'), the ## lagged and future values also change. So we need to write a loop for ## randomization inference. ## Step 1: Declare design, define output vectors set.seed(343) declaration <- declare_ra(N = 26, prob = 0.5, simple = T) perms <- obtain_permutation_matrix(declaration) dim(perms) ## [1] 26 10000 bivariate.out <- rep(NA, 10000) persistence.out <- rep(NA, 10000) anticipation.out <- rep(NA, 10000) # Step 2: Run a loop, estimating the quantities for each assignment vector # from 'perms' for (i in 1:10000) { # Define variables data$Z <- perms[, i]
data$Z_lag <- c(NA, data$Z[1:25])
data$Z_anticp <- c(data$Z[2:26], NA)

# Store output
bivariate.out[i] <- lm_robust(tetris ~ Z, data = data)$coefficients[2] persistence.out[i] <- summary(lm_robust(tetris ~ Z + Z_lag, data = data))$fstatistic[1]
anticipation.out[i] <- lm_robust(tetris ~ Z_anticp, data = data)\$coefficients[2]
}

# For RI p value in the regression tetris ~ run
mean(abs(bivariate.out) >= 13613.1)
## [1] 0.0109
# For RI p value on the F statistic in tetris ~ run + run_lag
mean(abs(persistence.out) >= 4.545)
## [1] 0.019
# For RI p value in the regression tetris ~ run_anticp
mean(abs(anticipation.out) >= 645.621)
## [1] 0.8994

Heterogeneous Treatment Effects

This section focuses on the variability in treatment effects or $$Var[\tau_i]$$. I will begin with a discussion on how to detect heterogeneous treatment effects. Mainly, we either place bounds on $$Var[\tau_i]$$, or do a hypothesis test ($$\widehat{Var[Y_i(1)]} = \widehat{Var[Y_i(0)]}$$) under the constant effects assumption. After this, I will show some regression-based strategies to model heterogeneity: first when there is interaction between treatment and covariates; and then between treatments.

Detecting Heterogeneity

Bounds on $$Var[\tau_i]$$

Intuition: We can never point-estimate $$Var[\tau_i]$$ because $$Cov[Y_i(1),Y_i(0)]$$ is never known. However, we can place bounds on this quantity by estimating the minimum and maximum covariance between treated and untreated potential outcomes.

To see this, note that:

$$Var[\tau_i] = Var[Y_i(1) - Y_i(0)] = Var[Y_i(1)] + Var[Y_i(0)] - 2\cdot Cov[Y_i(1),Y_i(0)]$$

We can use the observations in the treatment group to get $$\widehat{Var[Y_i(1)]}$$; and similarly use the control group units to get $$\widehat{Var[Y_i(0)]}$$. This leaves the covariance term:

• $$\widehat{Cov[Y_i(1),Y_i(0)]}$$ is largest if we pair the smallest $$Y_i(1)$$ values with the smallest $$Y_i(0)$$ values, and the largest $$Y_i(1)$$ values with the largest $$Y_i(0)$$ values. Assuming there are equal number of observations in treatment in control group, this means estimating the covariance between outcomes, when they are both arranged in ascending order. Call this $$\widehat{Cov_{max}[Y_i(1),Y_i(0)]}$$