Notes from Week III

Randomization inference with \(H_0: \tau_i=7\)

In Q.7, the sharp null is \(\tau_i = 7\) \(\forall i\). How do we test this? One approach is to rescale the treated potential outcomes in such a way that the null hypothesis becomes \(\tau_i=0\). For example, if we construct \(Y_i(1)^* = Y_i(1) - 7\), then \(E[Y_i(1)^* - Y_i(0)] = 0\)

Lets work through the logic of this by creating the data:

D Y Y0 Y1 Y0.null7 Y1.null7 Y1.null0 Y_obs
0 1 1 NA 1 8 1 1
0 0 0 NA 0 7 0 0
0 0 0 NA 0 7 0 0
0 4 4 NA 4 11 4 4
0 3 3 NA 3 10 3 3
1 2 NA 2 -5 2 -5 -5
1 11 NA 11 4 11 4 4
1 14 NA 14 7 14 7 7
1 0 NA 0 -7 0 -7 -7
1 3 NA 3 -4 3 -4 -4

After rescaling \(Y_i(1)\), the observed difference in means is:

out1 <- dat1 %>% summarise(DIM = mean(Y_obs[D == 1]) - mean(Y_obs[D == 0]))
out1$DIM
## [1] -2.6

It is not the same as the difference means prior to rescaling:

out2 <- dat1 %>% summarise(DIM = mean(Y[D == 1]) - mean(Y[D == 0]))
out2$DIM
## [1] 4.4

Now onto the hypothesis test using ri2

# Specify the randomization procedure
declaration <- declare_ra(N = nrow(dat1), m = 5)

# To conduct randomization inference use the 'conduct_ri' command:

q7_ri <- conduct_ri(Y_obs ~ D, declaration = declaration, assignment = "D", 
    sharp_hypothesis = 0, data = dat1)
summary(q7_ri, p = "lower")
##   coefficient estimate lower_p_value null_ci_lower null_ci_upper
## 1           D     -2.6     0.2063492          -5.4           5.4
plot(q7_ri)

Alternatively, we conduct randomization inference with \(H_0: \tau_i=7\) and do not rescale \(Y_i(1)\):

# Note that in the first line the outcome variable is Y (not Y_obs), and the
# sharp_hypothesis = 7 (not 0)

q7_ri_2 <- conduct_ri(Y ~ D, declaration = declaration, assignment = "D", sharp_hypothesis = 7, 
    data = dat1)
summary(q7_ri_2)
##   coefficient estimate two_tailed_p_value null_ci_lower null_ci_upper
## 1           D      4.4          0.8373016           1.6          12.4
plot(q7_ri_2)

Note that a two-tailed p value is uninformative in this case. So we compute a one-tail p value: (which will be identical to the one-tail p value from the \(H_0: \tau_i=0\) test).

summary(q7_ri_2, p = "lower")
##   coefficient estimate lower_p_value null_ci_lower null_ci_upper
## 1           D      4.4     0.2063492           1.6          12.4

Interference in Camerer’s study

In Camerer’s study, the outcome (betting odds) is defined in such a way that there can be interdependence: horse \(i\)’s betting odds are a proportion of the total bets, which includes bets placed on treated horse \(j\). In potential outcome terms, let horse \(i\)’s outcomes be a function of two things: \(i\)’s own treatment assignment, and \(j\)’s treatment assignment:

\(Y_i(d_i,d_j)\)

If the estimand is \(Y(1,0) - Y(0,0)\) there is interference because we observe \(Y(0,1)\) and must assume \(Y(0,1) = Y(0,0)\), which is not the case because bets on horse \(j\) affect \(i\)’s odds and outcomes. This problem is resolved if the estimand is redefined to be: \(Y(1,0) - Y(0,1)\).

Note that we compute within-pair difference in outcomes for both estimands, so the actual estimate does not change. We skirt non-interference violations by simply redefining our estimand.

Random assignment to clusters

If units are randomly assigned to clusters (which are of equal size), does the clustered standard error approximate the regular standard error?

Answer: Yes. Here is an intuition for why this is the case. To start, write down the regular standard error (Equation 3.4), and compare it to the clustered standard error (Equation 3.22):

Equation 3.4: \(\sqrt{\frac{1}{N-1}\{\frac{m}{N-m}Var[Y_i(0)] + \frac{N-m}{m}Var[Y_i(1)] + 2Cov[Y_i(0),Y_i(1)]\}}\)

Equation 3.22: \(\sqrt{\frac{1}{k-1}\{\frac{m}{N-m}Var[\bar{Y_i(0)}] + \frac{N-m}{m}Var[\bar{Y_i(1)}] + 2Cov[\bar{Y_i(0)},\bar{Y_i(1)}]\}}\)

Note that the only difference between these equations is that \(\frac{1}{N-1}\) is replaced by \(\frac{1}{k-1}\); and unit-level potential outcomes (\(Y_i(1)\) and \(Y_i(0)\)) are replaced by cluster-level average potential outcomes (\(\bar{Y_i(1)}\) and \(\bar{Y_i(0)}\)). To show equality, we must establish some relationship between these terms.

Claim:

\(Var[\bar{Y_{cluster}}] \approx \frac{Var[Y]}{\frac{N-1}{k-1}}\)

\(Cov[\bar{Y_{cluster}},\bar{X_{cluster}}] \approx \frac{Cov[Y,X]}{\frac{N-1}{k-1}}\)

To explain why this is the case, consider the limit case in which \(N\) units are randomly assigned to \(k=N\) clusters. In this case, we would expect (for example) variance in unit-level potential outcomes to equal the variance in cluster averages. This is indeed the case:

\(Var[\bar{Y_{cluster}}] = \frac{Var[Y]}{\frac{N-1}{k-1}} = \frac{Var[Y]}{\frac{N-1}{\color{red}{N}-1}} = Var[Y]\)

This equality turns into an approximation (\(\approx\)) as the number of clusters becomes smaller. Here is a simulation to confirm this:

Cluster.Number Var.ClusterAvg Adjustment Var.ClusterAvg:Adjustment Var_Pop
5 0.0107596 2499.75000 26.89624 16.21497
10 0.0171489 1111.00000 19.05242 15.98230
20 0.0397098 526.26316 20.89781 16.49134
50 0.0936146 204.06122 19.10311 16.09046
100 0.1504140 101.00000 15.19181 15.89472
500 0.7514603 20.03808 15.05782 15.56854
1000 1.5903855 10.00901 15.91818 16.00773

Note: Adjustment refers to \(\frac{N-1}{k-1}\).

Power calculations

The “power” of an experiment refers to the probability a researcher will be able to reject the null hypothesis of no treatment effect (Gerber and Green:93). It is: \(1 - Pr\)(Committing Type 2 Error), where a Type 2 error is the probability of retaining a false null hypothesis.

In this exercise, we calculate the power of an experiment (\(n=126\)) to detect an average treatment effect of \(\approx\) 0.5 SD units. Power calculations are typically part of a pre-analysis plan, and involve the following steps:

For each iteration of a loop:

  1. Generate some data according to an assumed data generating process.

  2. Estimate the average treatment effect, given some observed assignment vector Z.

  3. Compute the uncertainty around your estimate using randomization inference

  4. Set a success criterion (say \(p \leq 0.05\)) and record whether the experiment rejected the null (\(success = 1\)), or failed to reject it (\(p > 0.05\) and \(success =0\)).

Task: Here is a data generating process that yields an average treatment effect of 0.5 SD units. Assume we are planning an experiment with \(n=126\). What is the power of such an experiment to detect this effect?

# Basic information

sims <- 100  # Number of simulations
N <- 126  # Size of the experimental sample
m <- N/2  # Number of units to be assigned to treatment
ate_sims <- rep(NA, sims)  # ATE estimate from each experiment
pvalue = rep(NA, sims)  # p-value from each experiment
success = rep(NA, sims)  # report the success of each experiment here

# Data generating process
set.seed(64)

dat2 <- data_frame(Y0 = rnorm(N, mean = 0, sd = 1), Y1 = Y0 + rnorm(N, mean = 0.5, 
    sd = 1), Z = complete_ra(N = N, m = m), Y = Y1 * Z + Y0 * (1 - Z))

# To make sure the data generating process fits our description:

out3 <- dat2 %>% summarise(DIM = mean(Y1 - Y0), SD_Y0 = sd(Y[Z == 0]))
kable(out3)
DIM SD_Y0
0.5556688 1.000367

Can you setup a loop, complete steps 2-4, and estimate the power of this experiment?

Below, I produce the results from my simulation:

for (i in 1:sims) {
    # Create Data
    dat2 <- data_frame(Y0 = rnorm(N, mean = 0, sd = 1), Y1 = Y0 + rnorm(N, mean = 0.5, 
        sd = 1), Z = complete_ra(N = N, m = m), Y = Y1 * Z + Y0 * (1 - Z))
    # Estimate the ATE
    model <- estimatr::lm_robust(Y ~ Z, data = dat2)
    ate_sims[i] <- model$coefficients[2]
    # Randomization inference to get p value
    declaration <- declare_ra(N = N, m = m)
    ri_temp <- ri2::conduct_ri(Y ~ Z, declaration = declaration, assignment = "Z", 
        sharp_hypothesis = 0, data = dat2)
    # Report Outcomes
    pvalue[i] = summary(ri_temp)[1, 3]  # 1st row, 3rd colum reports p-value
    success[i] = as.numeric(summary(ri_temp)[1, 3] <= 0.05)
}

mean(success)  # power of the experiment
## [1] 0.61
hist(pvalue, breaks = 100, main = "Distribution of P Values")

Next, I write a function that computes statistical power for a given sample size. For computational efficiency, I use p values generated from lm_robust rather than randomization inference. Here is a plot that shows how statistical power changes with sample size:

Restricted Randomization

A restricted randomization constrains the randomization procedure by discarding certain assignment vectors that do not satisfy a condition. For example, a researcher may discard an assignment vector because it fails the balance test, and keeps randomizing till the balance criterion is satisfied.

In this section, I first show how to conduct such a randomization, and then how to estimate the average treatment effect from such a procedure.

Data

Lets begin by creating some data: \(Y_i(0)\), \(Y_i(1)\), and two covariates \(X_1\) and \(X_2\). By construction the true ATE or \(E[\tau_i] \approx 5\).

set.seed(100)
dat3 <- data_frame(Y0 = rnorm(1000, mean = 0, sd = 20), Y1 = Y0 + rnorm(1000, 
    mean = 5, sd = 20), X1 = as.numeric(Y0 >= 0) + runif(1000, min = -1, max = 1), 
    X2 = ifelse(Y0 <= -1, 1, ifelse((Y0 > -1) & (Y0 <= 0), 2, ifelse((Y0 > 0) & 
        (Y0 <= 1), 3, 4))))
true_ate <- dat3 %>% summarise(value = mean(Y1) - mean(Y0))
true_ate$value
## [1] 5.082935

Conducting the procedure

Say we are interested in randomly assigning \(m=500\) units to treatment, but want an assignment vector that is orthogonal to \(X_1\) and \(X_2\). The qualifying criterion is that the covariates are not jointly significant in a regression \(Z_i \sim X_{1,i} + X_{2,i}\):

Let \(model_{null}\) be \(Z_i \sim c\), and \(model_{alt}\) be \(Z_i \sim X_{1,i} + X_{2,i}\) Then an F-statistic is defined as: \(\frac{\frac{SSR_{null} - SSR_{alt}}{q}}{\frac{SSR_{alt}}{N-q-1}}\)

Where \(q=2\) or the number of covariates, which under the null hypothesis predict treatment assignment no better than would be expected by chance, and under the alternative hypothesis systematically predict treatment assignment (Gerber and Green:107).

We will now use this test statistic to restrict randomization:

N <- nrow(dat3)

# Step 1: Write a function that computes the joint significance of
# covariates as predictors of some assignment vector

custom_ra_function <- function() {
    continue <- TRUE
    while (continue) {
        Z <- complete_ra(N)
        model <- lm_robust(Z ~ dat3$X1 + dat3$X2)
        # Extract F statistic
        f_stat <- summary(model)$fstatistic
        # Calculate p-value of F statistic
        p.val <- 1 - pf(q = f_stat[1], df1 = f_stat[2], df2 = f_stat[3])
        if (p.val > 0.05) {
            return(Z)
        }
    }
}

# Step 2: Generate restricted random assignments
sims <- 1000

permutation_matrix <- replicate(sims, custom_ra_function())
dim(permutation_matrix)
## [1] 1000 1000

Columns of this permutation matrix contain assignment vectors that satisfy the restriction condition. We now randomly select one of these columns as the realized treatment assignment:

set.seed(23)
number <- sample(1:1000, 1, T)
dat3$Z <- permutation_matrix[, number]

Finally, we feed this information into randomizr so we can calculate weights, and do randomization inference:

# Step 3: Declare the procedure to randomizr

declaration <- declare_ra(permutation_matrix = permutation_matrix)
declaration
## Number of units: 1000 
## Number of treatment arms: 2 
## The possible treatment categories are 0 and 1.
## The probabilities of assignment are NOT constant across units. Your analysis strategy must account for differential probabilities of assignment, typically by employing inverse probability weights.
# Step 4: Probability of assignment to conditions
probs <- obtain_condition_probabilities(declaration, dat3$Z)
hist(probs, breaks = 50, main = "Distribution of Assignment Condition Probabilities")

Inverse Probability Weighting

# Step 5: Calculate Weights, Y_obs
weights <- 1/probs

# Step 6: Switching equation
dat3 <- dat3 %>% mutate(Y_obsv = Y1 * Z + Y0 * (1 - Z))

Note that imposing a restriction on the randomization procedure produces unequal probabilities of assignment to conditions. In such cases, the difference-in-means estimator (which equally weights all observations) is biased. So we use an inverse probability weighed estimator:

# Naive model without weights
model1 <- tidy(lm_robust(Y_obsv ~ Z, data = dat3))
kable(model1, digits = 4)
coefficient_name coefficients se p ci_lower ci_upper df outcome
(Intercept) 0.6449 0.9049 0.4762 -1.1307 2.4206 998 Y_obsv
Z 3.8784 1.5612 0.0131 0.8148 6.9420 998 Y_obsv
# IPW estimator
model2 <- tidy(lm_robust(Y_obsv ~ Z, weights = weights, data = dat3))
kable(model2, digits = 4)
coefficient_name coefficients se p ci_lower ci_upper df outcome
(Intercept) 0.6898 0.9058 0.4466 -1.0878 2.4673 998 Y_obsv
Z 3.7688 1.5618 0.0160 0.7041 6.8336 998 Y_obsv

Randomization Inference

restricted_ri <- conduct_ri(Y_obsv ~ Z, declaration = declaration, assignment = "Z", 
    sharp_hypothesis = 0, sims = sims, data = dat3)
summary(restricted_ri)
##   coefficient estimate two_tailed_p_value null_ci_lower null_ci_upper
## 1           Z 3.768826              0.012     -2.889869      2.992759
plot(restricted_ri)

Using blockTools

Say we have a set of covariates (\(X_1\) and \(X_2\)) and want to block on them. blockTools helps construct blocks by matching units on pre-treatment covariates. In this section, I show how we can use the r package to do this. For more on this, click here, and for the package documentation, here

# Download/load package in R
library(blockTools)

# Step 1: Specify to blockTools which pre-treatment covariates are to be
# used for blocking. Note that blockTools does not take categorical or
# non-numeric variables, so we must convert all covariates into numeric
# form. To do this, we use 'model.matrix' - a command typically used
# under-the-hood in regression computation to create (X'X)^(-1)X'Y :

covariate_matrix <- model.matrix(~X1 + X2, data = dat3)

head(covariate_matrix)  # Notice that the first column is the intercept (as it usually is in a regression). So we exclude it 
##   (Intercept)          X1 X2
## 1           1  0.82261898  1
## 2           1  0.70317902  4
## 3           1 -0.18170867  1
## 4           1  1.73475568  4
## 5           1  0.08318728  4
## 6           1  0.24251473  4
covariate_matrix <- model.matrix(~X1 + X2, data = dat3)[, -1]

# Step 2: We need to create an 'id' variable (essentially row numbers as a
# separate column) for blockTools

dat4 <- data.frame(id_var = 1:nrow(covariate_matrix), covariate_matrix)

head(dat4)
##   id_var          X1 X2
## 1      1  0.82261898  1
## 2      2  0.70317902  4
## 3      3 -0.18170867  1
## 4      4  1.73475568  4
## 5      5  0.08318728  4
## 6      6  0.24251473  4
# Step 3: 'dat4' is a dataframe that can be read by blockTools. Now we use
# it to create blocks, say quartets (whether you decide on pairs, trios, or
# some other block size is an arbitrary decision):


out <- block(dat4, n.tr = 4, id.vars = "id_var", block.vars = colnames(dat4)[-1])

# This 'block' command is computing distance between units in covariate
# space. It defaults to Mahalanobis distance, but you can use 'distance ='
# and select from 'mcd','mve','eclidean'. We use this information in the
# next step to create homogenous blocks:

# Step 3B: Use 'out' to create actual blocks. Note that this is a stochastic
# process - it changes each time we run the command, so set.seed() for
# replicability

dat3$block <- createBlockIDs(out, dat4, id.var = "id_var")

table(dat3$block)
## 
##   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18 
##   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4 
##  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36 
##   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4 
##  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54 
##   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4 
##  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72 
##   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4 
##  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90 
##   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4 
##  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107 108 
##   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4 
## 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 
##   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4 
## 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 
##   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4 
## 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 
##   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4 
## 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 
##   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4 
## 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 
##   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4 
## 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 
##   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4 
## 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 
##   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4 
## 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 
##   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4   4
# Step 4: Back to randomizr to conduct blocked assignment:

declaration <- declare_ra(N = nrow(dat3), blocks = dat3$block)
dat3$Z_block <- conduct_ra(declaration)

head(table(dat3$block, dat3$Z_block))
##    
##     0 1
##   1 2 2
##   2 2 2
##   3 2 2
##   4 2 2
##   5 2 2
##   6 2 2

Appendix I: Randomization Checks on ri2

Last week we used the loop and function approach for randomization checks. ri2 allows us to do this with custom functions. For more on this, see the package manual.

For the purposes of section, I use dat3 (with covariates \(X_1\) and \(X_2\)) to illustrate how the code for this works:

declaration <- declare_ra(N = nrow(dat3), m = nrow(dat3)/2)
declaration
## Random assignment procedure: Complete random assignment 
## Number of units: 1000 
## Number of treatment arms: 2 
## The possible treatment categories are 0 and 1.
## The probabilities of assignment are constant across units.
# Lets create some assignment vector, and specify Y_obsv

set.seed(15)

dat3 <- dat3 %>% mutate(Z_obsv = conduct_ra(declaration), Y_obsv = Y1 * Z_obsv + 
    Y0 * (1 - Z_obsv))


# Step 1: Specify a 'test' in the form of a function that ri2 will perform:

balance.test <- function(data) {
    model <- lm_robust(Z_obsv ~ X1 + X2, data = data)
    f_stat <- summary(model)$fstatistic[1]
    names(f_stat) <- NULL  # removes the column name 'value'
    return(f_stat)
}


# Step 2: Input this function into ri2

ri_balance <- conduct_ri(test_function = balance.test, declaration = declaration, 
    assignment = "Z_obsv", sharp_hypothesis = 0, data = dat3, sims = 1000)
plot(ri_balance)

summary(ri_balance, "upper")
##             coefficient  estimate upper_p_value null_ci_lower
## 1 Custom Test Statistic 0.7897766          0.48    0.02341592
##   null_ci_upper
## 1       3.68646

Task: Can you write a function (or test) that performs a randomization check when \(Z_i \sim X_{1,i} + X_{2,i}\) is a logistic regression; and when there are multi-arm experiments (i.e. \(Z_i\) is a nominal variable with three or more levels)?

(Hint: You want to use the log-likelihood ratio as a test of joint significance).