Notes from Week IX

Within-Subject Studies

Within-subject experiments track a single person or entity over a period of time, and random assignment determines when a treatment is administered (Gerber and Green 2012:273). Such designs typically make two non-interference assumptions: no anticipation (\(D_{i,t+1}\) does not affect \(Y_{i,t}\)) and no persistence (\(D_{i,t-1}\) does not affect \(Y_{i,t}\)). In this section, I use the “tetris” dataset to evaluate these assumptions (this corresponds to Question 10 (b) and (c) in the previous week’s problem set):

library(tidyverse)
library(randomizr)
library(estimatr)
library(knitr)

# Download the dataset
data <- foreign::read.dta("W8_Tetris.dta")

# We need to create two variables:
data$run_lag <- c(NA, data$run[1:25])
data$run_anticp <- c(data$run[2:26], NA)

# Estimate the effects:

effects <- data.frame(Bivariate = c(lm_robust(tetris ~ run, data = data)$coefficients[2], 
    "(Coefficient from tetris ~ run )"), Persistence = c(summary(lm_robust(tetris ~ 
    run + run_lag, data = data))$fstatistic[1], "(F Statistic from tetris ~ run + run_lag)"), 
    Anticipation = c(lm_robust(tetris ~ run_anticp, data = data)$coefficients[2], 
        "(Coefficient from tetris ~ run_anticp)"))

kable(effects, row.names = F, caption = "Estimates", digits = 2)
Estimates
Bivariate Persistence Anticipation
13613.1 4.5445923571162 645.621212121212
(Coefficient from tetris ~ run ) (F Statistic from tetris ~ run + run_lag) (Coefficient from tetris ~ run_anticp)
# Conduct randomization inference:

## Note that every time we generate a new assignment vector ('run'), the
## lagged and future values also change. So we need to write a loop for
## randomization inference.

## Step 1: Declare design, define output vectors
set.seed(343)

declaration <- declare_ra(N = 26, prob = 0.5, simple = T)
perms <- obtain_permutation_matrix(declaration)
dim(perms)
## [1]    26 10000
bivariate.out <- rep(NA, 10000)
persistence.out <- rep(NA, 10000)
anticipation.out <- rep(NA, 10000)

# Step 2: Run a loop, estimating the quantities for each assignment vector
# from 'perms'

for (i in 1:10000) {
    # Define variables
    data$Z <- perms[, i]
    data$Z_lag <- c(NA, data$Z[1:25])
    data$Z_anticp <- c(data$Z[2:26], NA)
    
    # Store output
    bivariate.out[i] <- lm_robust(tetris ~ Z, data = data)$coefficients[2]
    persistence.out[i] <- summary(lm_robust(tetris ~ Z + Z_lag, data = data))$fstatistic[1]
    anticipation.out[i] <- lm_robust(tetris ~ Z_anticp, data = data)$coefficients[2]
}

# For RI p value in the regression tetris ~ run
mean(abs(bivariate.out) >= 13613.1)
## [1] 0.0109
# For RI p value on the F statistic in tetris ~ run + run_lag
mean(abs(persistence.out) >= 4.545)
## [1] 0.019
# For RI p value in the regression tetris ~ run_anticp
mean(abs(anticipation.out) >= 645.621)
## [1] 0.8994

Heterogeneous Treatment Effects

This section focuses on the variability in treatment effects or \(Var[\tau_i]\). I will begin with a discussion on how to detect heterogeneous treatment effects. Mainly, we either place bounds on \(Var[\tau_i]\), or do a hypothesis test (\(\widehat{Var[Y_i(1)]} = \widehat{Var[Y_i(0)]}\)) under the constant effects assumption. After this, I will show some regression-based strategies to model heterogeneity: first when there is interaction between treatment and covariates; and then between treatments.

Detecting Heterogeneity

Bounds on \(Var[\tau_i]\)

Intuition: We can never point-estimate \(Var[\tau_i]\) because \(Cov[Y_i(1),Y_i(0)]\) is never known. However, we can place bounds on this quantity by estimating the minimum and maximum covariance between treated and untreated potential outcomes.

To see this, note that:

\(Var[\tau_i] = Var[Y_i(1) - Y_i(0)] = Var[Y_i(1)] + Var[Y_i(0)] - 2\cdot Cov[Y_i(1),Y_i(0)]\)

We can use the observations in the treatment group to get \(\widehat{Var[Y_i(1)]}\); and similarly use the control group units to get \(\widehat{Var[Y_i(0)]}\). This leaves the covariance term:

  • \(\widehat{Cov[Y_i(1),Y_i(0)]}\) is largest if we pair the smallest \(Y_i(1)\) values with the smallest \(Y_i(0)\) values, and the largest \(Y_i(1)\) values with the largest \(Y_i(0)\) values. Assuming there are equal number of observations in treatment in control group, this means estimating the covariance between outcomes, when they are both arranged in ascending order. Call this \(\widehat{Cov_{max}[Y_i(1),Y_i(0)]}\)