Takeaway: var_pop
and cov_pop
are used to calculate the true standard error (i.e. Equation 3.4
). This is only possible when we know the potential outcomes \(Y_i(1)\) and \(Y_i(0)\) for all subjects. When we have some observed data, that is \(D_i\) and \(Y_i\), we estimate a standard error. These functions are not used to estimate the standard error, namely when we apply Equation 3.6
.
The true standard error is:
\(\text{SE}(\widehat{ATE}) = \sqrt{\frac{1}{N-1}\{\frac{m}{N-m}Var[Y_i(0)] + \frac{N-m}{m}Var[Y_i(1)] + 2Cov[Y_i(0),Y_i(1)]\}}\)
Note that \(Var(Y_i(1)) = \frac{1}{N}\sum_{i=1}^N (Y_i(1) - \frac{\sum_1^N Y_i(1)}{N})^2\). When we do var(Y1)
, the software assumes we are estimating the variance of a variable, and so it calculates: \(\frac{1}{\color{red}{N-1}}\sum_{i=1}^N (Y_i(1) - \frac{\sum_1^N Y_i(1)}{N})^2\). To correct for this, we write a custom function var_pop
that essentially calculates sum((x-mean(x))^2)/length(x)
where length(x) = N
. The same logic applies to \(Var(Y_i(0))\).
Similarly, \(Cov(Y_i(1),Y_i(0)) = \frac{1}{N}\sum_{i=1}^N (Y_i(1) - \frac{\sum_1^N Y_i(1)}{N})\cdot(Y_i(0) - \frac{\sum_1^N Y_i(0)}{N})\). However, cov(Y1,Y0)
calculates the following quantity: \(\frac{1}{\color{red}{N-1}}\sum_{i=1}^N (Y_i(1) - \frac{\sum_1^N Y_i(1)}{N})\cdot(Y_i(0) - \frac{\sum_1^N Y_i(0)}{N})\). This is why we use cov_pop
, which calculates sum((x-mean(x))*(y-mean(y)))/length(x)
where length(x)= N
.
By contrast, an estimate of the standard error is:
\(\widehat{SE} = \sqrt{\frac{\widehat{Var(Y_i(0))}}{N-m} + \frac{\widehat{Var(Y_i(1))}}{m}}\)
Here, \(\widehat{Var(Y_i(1))} = \frac{1}{m-1} \sum_1^m (Y_i | d_i=1 - \frac{\sum_1^m Y_i|d_i=1}{m})^2\). Crucially, var(Y[d==1])
estimates this very quantity. Similarly for \(Var(Y_i(0))\). We therefore do not need to use var_pop
here. The same logic applies to \(Cov(Y_i(1),Y_i(0))\). This is why we do not use var_pop
and cov_pop
when we are estimating the standard error (i.e. we have actual data and are applying Equation 3.6).
An interaction effect refers to the change in treatment effect in different subgroups or covariate profiles. For example, let a covariate \(X_i\) take on two values: 0 and 1. In a given sample of subjects, there are two sub-groups: study participants with \(X_i=1\) and those with \(X_i=0\). An interaction effect refers to the fact that the treatment effect for sub-group \(X_i=1\) (or \(\widehat{ATE_{X=1}}\)) is different from the treatment effect for the sub-group \(X_i=0\) (or \(\widehat{ATE_{X=0}}\)). These ATEs for a particular subset or subgroup of subjects is referred to as a conditional average treatment effect or CATE.
Note that sub-group differences in treatment effects are descriptive findings. They are not the causal effect of a covariate \(X\) because we do not randomly assign values of \(X\). Any number of factors can be correlated with \(X\) that account for differences in the treatment effect.
I create some data in which the treatment effect interacts with a covariate, \(X_i \in \{0,1\}\).
set.seed(04012022)
# Specify potential outcomes that are unobserved
dat_cov_interaction <- tibble(
X = complete_ra(N = 500, m = 250),
U = rnorm(500, mean = 2, sd = 2.5),
Y0 = rnorm(500, mean = 0 , sd = 2.5) + U * X,
Y1 = Y0 + 0.5 + 0.25*X
)
# Conduct a random assignment, apply the switching equation to get observed outcomes
dat_cov_interaction <- dat_cov_interaction %>%
mutate(Z = complete_ra(N = 500, m = 250), # 250 of 500 subjects assigned to treatment
Y = Y0 * (1 - Z) + Z * Y1)
# Select the observed variables
actual_dat <- dat_cov_interaction %>%
select(X,Z,Y)
write_csv(actual_dat, file = "covariate_treatment_interaction.csv")
# Make a table of the head of this data set
kable(head(actual_dat),
caption = "Glimpse of Dataset",
caption.above = TRUE,
digits = 3)
X | Z | Y |
---|---|---|
1 | 1 | 1.224 |
1 | 0 | 0.390 |
1 | 0 | 1.348 |
0 | 0 | 0.346 |
1 | 1 | 1.454 |
0 | 0 | 4.183 |
Here is the average treatment effect for the two sub-groups, \(\widehat{ATE_{X=1}}\) and \(\widehat{ATE_{X=0}}\). These are called conditional average treatment effects:
cates <- actual_dat %>%
group_by(X) %>%
summarise(
tidy(lm_robust(Y ~ Z, data = cur_data()))
) %>%
select(-df,-outcome)
kable(cates,
caption = "Conditional Average Treatment Effects",
caption.above = TRUE,
digits = 2)
X | term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|---|
0 | (Intercept) | -0.31 | 0.24 | -1.27 | 0.20 | -0.79 | 0.17 |
0 | Z | 0.58 | 0.32 | 1.79 | 0.08 | -0.06 | 1.21 |
1 | (Intercept) | 1.53 | 0.32 | 4.83 | 0.00 | 0.91 | 2.16 |
1 | Z | 1.31 | 0.43 | 3.05 | 0.00 | 0.47 | 2.16 |
And here is a coefficient plot made with ggplot2
that visualizes the same information:
fig_cates <- ggplot(data = cates %>% filter(term != "(Intercept)"),
aes(x = estimate,
y = as.factor(X))) +
geom_point() +
geom_linerange(aes(xmin = conf.low, xmax = conf.high)) +
geom_vline(xintercept = 0, linetype = "dashed") +
xlim(-3,3) +
xlab("CATE") +
ylab("Sub-Group") +
theme_bw()
fig_cates
Load the data set
covariate_treatment_interaction.csv
and evaluate the hypothesis that \(\widehat{ATE_{X=1}} - \widehat{ATE_{X=0}} \neq 0\). In other words, the treatment effect in sub-group \(X_i=1\) is not equal to the treatment effect in sub-group \(X_i=0\).
Hint: You should specify a regression of the following type:
\(\text{Outcome} = \beta_0 + \beta_1\cdot(Treatment) + \beta_2\cdot(Covariate) + \beta_3\cdot(Treatment \times Covariate)\).
We are interested in \(\beta_3\) or the difference in conditional average treatment effects.
Extra: Can you use the information in the table titled Conditional Average Treatment Effects
to calculate \(\beta_0\), \(\beta_1\), \(\beta_2\) and \(\beta_3\)? Confirm your calculations with the regression output.
A treatment-by-treatment interaction refers to a design in which subjects are randomly assigned to two treatments, \(Z_1\) and \(Z_2\). In effect, subjects can be assigned to one of four possible conditions: when both \(Z_1\) and \(Z_2\) are 0, when both \(Z_1\) and \(Z_2\) are 1, and when one of the two treatment conditions is equal to 1. A factorial design is a generalization of this in which \(k\) treatment are randomly assigned, producing \(2^k\) experimental conditions (assuming every treatment has 2 levels).
To estimate treatment-by-treatment interaction effects, we specify the following regression:
\(\text{Outcome} = \beta_0 + \beta_1Z_1 + \beta_2Z_2 + \beta_3(Z_1 \times Z_2)\)
Where \(Z_1\) is an indicator variable that takes a value of 1 if the subject receives treatment 1, otherwise 0; and \(Z_2\) is an indicator variable that takes a value of 1 if the subject receives treatment 2, otherwise 0. As before, \(\beta_3\) captures the difference in treatment effects, i.e. \(\widehat{ATE_{Z_2}} - \widehat{ATE_{Z_1}}\).
Note that this interaction effect is causal because we randomly assign subjects to both treatment conditions, namely \(Z_1\) and \(Z_2\).
We go back to Bertrand and Mullainathan (2004)’s study that sends resumes to firms for a job opening, randomly varying the applicant’s name (perceptions of race) and “quality” of the resume. They conduct the study in two cities, Boston and Chicago. The outcome of interest is whether the resume gets a call back. Here is a table that reproduces the results in the paper, i.e. the call back rates by treatment conditions (race
and quality
) and the covariate city
.
Propose a regression model that assesses the effects of treatments (race
and quality
), interaction between them, and interactions between the treatments and the covariate, city
. Let race = 1
if the resume uses a name that cues Black identity, quality = 1
if the resume is of low quality, and city = 1
for Boston.
Use the table above to estimate the parameters of that regression model.
Extra: Can you use confirm your calculations by estimating the regression using the Bertrand and Mullainathan (2004) data set we used in a prior week’s problem set?
library(estimatr)
library(texreg)
dat <- read_csv("covariate_treatment_interaction.csv")
fit = lm_robust(Y ~ Z + X + Z:X, data = dat)
htmlreg(list(fit),
include.ci = FALSE,
digits = 2,
caption = "CATE Estimates",
booktabs = TRUE)
Model 1 | |
---|---|
(Intercept) | -0.31 |
(0.24) | |
Z | 0.58 |
(0.32) | |
X | 1.84*** |
(0.40) | |
Z:X | 0.74 |
(0.54) | |
R2 | 0.14 |
Adj. R2 | 0.13 |
Num. obs. | 500 |
RMSE | 3.01 |
p < 0.001; p < 0.01; p < 0.05 |
Takeaway: The regression table reports \(\beta_3\) as Z:X
or the interaction effect. This term captures the difference in CATEs, or \(\widehat{ATE_{X=1}} - \widehat{ATE_{X=0}}\). In the table we see that this difference is positive but we cannot reject the null hypothesis \(H_0: \beta_3 =0\). In other words, we cannot rule out the possibility that the treatment effect is the same for both sub-groups. This means there is no empirical support for the claim that the treatment “works” for sub-group \(X=1\) and does not for sub-group \(X=0\); or a version of this claim that treatment effects are “driven” by the sub-group \(X=1\).
\[\begin{equation} \text{Call Back} \sim b_0 + b_1(\text{Black}_i) + b_2(\text{Low}_i) + b_3(\text{Boston}_i) \\ + b_4(\text{Black}_i \times \text{Low}_i) + b_5(\text{Black}_i \times \text{Boston}_i) \\ + b_6(\text{Low}_i \times \text{Boston}_i) \\ + b_7(\text{Black}_i \times \text{Low}_i \times \text{Boston}_i) \end{equation}\]
\(\beta_0 = 8.94\), because the reference group is White, “high” quality resume in Chicago.
\(\beta_1 = -3.66\), because racial differences in call back rates among high quality resumes in Chicago is \(5.28 - 8.94 = -3.66\)
\(\beta_2 = -1.78\), because resume quality-based differences in call back rates among White applicants in Chicago is \(7.16 - 8.94 = -1.78\)
\(\beta_3 = 4.18\), because the difference in call back rates between Boston and Chicago among high quality resumes of White applicants is \(13.12 - 8.94 = 4.18\)
\(\beta_4 = 2.02\), because \((5.52 - 7.16) - (5.28 - 8.94) = 2.02\)
\(\beta_5 = -0.96\), because \((8.50 - 13.12) - (5.28 - 8.94) = -0.96\)
\(\beta_6 = -1.19\), because \((10.15 - 13.12) - (7.16 - 8.94) = -1.19\)
\(\beta_7 = -0.54\), because \([(7.01 - 10.15) - (8.50 - 13.12)] - [(5.52 - 7.16) - (5.28 - 8.94)] = -0.54\)