Alert: You can download this document from http://github.com/shikhar46/quantmethods or Canvas’ Files page. This survey of RDDs draws on Cattaneo, Idrobo and Titiunik, Volume I (2019), Peter Aronow’s PLSC 503 Lecture Slides, and Alex Coppock’s lecture slides and DeclareDesign code.


Essential Ingredients

A regression discontinuity design leverages qualitative knowledge about the assignment of a treatment, specifically the existence of an arbitrary cut-point at which the probability of being treated, \(P(Z_i=1)\), “jumps” or “discontinuously changes”. There are three main ingredients of a RD:

  1. A forcing variable (\(X_i\)) that affects outcomes (\(Y_i\)), and treatment assignment (\(Z_i\)). It can also moderate the effect of this treatment.

  2. A treatment (\(Z_i\)) of interest

  3. A cut-point at which the probability of being treated, \(P(Z_i = 1)\), discontinuously changes. For example, in a sharp regression discontinuity, \(Z_i = \mathbb{1}(X_i > c)\), where \(\mathbb{1}(\cdot)\) is an indicator function that changes discontinuously at the cutpoint \(c\). Equivalently:

\[\begin{equation} Z_i = \begin{cases} 1 & \text{when} & X_i > c \\ 0 & \text{when} & X_i \leq c \end{cases} \end{equation}\]

As Cattaneo, Idrobo and Titiunik (2019) summarize:

In the RD design, all units have a score, and a treatment is assigned to those units whose value of the score exceeds a known cutoff or threshold, and not assigned to units whose value of the score is below the cutoff. The key feature of the design is that the probability of receiving the treatment changes abruptly at the known threshold. If units are unable to perfectly “sort” around this threshold, the discontinuous change in this probability can be used to learn about the local causal effect of the treatment on an outcome of interest. (CIT 2019:3)


The defining feature of a regression discontinuity, namely the “jump” in the probability of treatment assignment, is empirically verifiable. For example, we can plot treatment assignment at different values of the forcing (or running) variable. I demonstrate this using a RD design specified in DeclareDesign.

library(tidyverse)
library(DeclareDesign)
library(ggthemes)

# Write a function to make untreated potential outcomes
make_Y0 <- function(X) {
  N = length(X)
  2*X + 1.5*X^2 - 0.6*X^3
}

# Write a function to make treated potential outcomes
make_Y1 <- function(X) {
  N = length(X)
  1.5 + 0.2*X - 1.4*X^2
}

rd_design <-
  declare_population(
    N = 1000,
    # Specify a forcing variable
    X = runif(N, -1, 1),
    # Pr(Treatment=1) discontinuously changes at X=0
    Z = 1*(X > 0),
    noise = rnorm(N, 0.1,0.4)
  ) +
  declare_potential_outcomes(Y ~ Z * make_Y1(X) + (1 - Z) * make_Y0(X) + noise) +
  # Estimand is the ATE exactly at the cutpoint
  declare_estimand(LATE = make_Y1(0) - make_Y0(0)) +
  declare_reveal(Y, Z)

dat <- draw_data(rd_design)

# Visualizing the discontinuity in Z

ggplot(dat, aes(x = X, y = Z)) +
  geom_vline(xintercept = 0, size = .5, color = "darkblue") +
  geom_point() +
  ylab("Treatment (Z)") +
  xlab("Forcing Variable (X)") +
  ggtitle("Forcing Variable and Treatment Status",
          subtitle = "(Deterministic Regression Discontinuity Design") +
  theme_economist_white()


When such a discontinuity exists, the observed outcome is:

\[\begin{equation} Y_i = Y_i(1) \cdot Z_i + Y_i(0) \cdot (1-Z_i) = \begin{cases} Y_i(1) & \text{when} & X_i > c \\ Y_i(0) & \text{when} & X_i \leq c \end{cases} \end{equation}\]

This means we only observe the untreated potential outcome, \(Y_i(0)\) for units at or below the cut-point \(X_i = c\), and the treated potential outcome \(Y_i(1)\) for those above this threshold. The observed average outcome then is:

\[\begin{equation} \mathbb{E}[Y_i | X_i] = \begin{cases} \mathbb{E}[Y_i(1)|X_i] & \text{when} & X_i > c \\ \mathbb{E}[Y_i(0)|X_i] & \text{when} & X_i \leq c \end{cases} \end{equation}\]

This presents two problems for causal inference:

  1. Unobserved Counterfactual: For every unit \(i\), we either observe their treated potential outcome \(Y_i(1)\) or their untreated potential outcome \(Y_i(0)\) but not both.

  2. No overlap or common support: Units in the control and treatment groups do not have the same value of the forcing variable \(X_i\). At the cut-point \(X_i = c\), we only have units in control (or treatment if \(Z_i = \mathbb{1}(X_i \geq c)\)). As a result, standard regression, matching, or stratified estimation procedures break down because there is no comparison group within a stratum.

A regression discontinuity design “fundamentally relies on extrapolation towards the cutoff point” (Cattaneo, Idrobo, and Titiunik 2019:10). To help visualize this, I plot data from the RD design specified earlier.

# Visualizing the RD data
fig2 <- 
  ggplot(dat, aes(x = X, y = Y, color = as.factor(Z))) +
  geom_point(data = dat, alpha = .2) +
  geom_vline(xintercept = 0, linetype = "dotted") +
  stat_function(fun = make_Y0, xlim = c(0,-1), color = "red") + 
  stat_function(fun = make_Y0, xlim = c(0.01, 1), color = "red", linetype = "dashed") + 
  stat_function(fun = make_Y1, xlim = c(0,-1), color = "darkblue",linetype = "dashed") + 
  stat_function(fun = make_Y1, xlim = c(0.01, 1), color = "darkblue") + 
  geom_segment(aes(x = 0, xend = 0, y = make_Y0(0), yend = make_Y1(0)), color = "black") +
  xlab("Forcing Variable (X)") +
  ylab("Outcome (Y)") +
  ggtitle("Regression Discontinuity Plot", subtitle = "(Outcomes at Different Levels of the Forcing Variable)")+
  theme_economist_white()

fig2

Estimand

The average treatment effect at a given value of the score, \(\mathbb{E}[Y_i(1)|X_i=x]- \mathbb{E}[Y_i(0)|X_i=x]\) is the vertical distance between the two regression curves at that value. The distance cannot be directly estimated because we never observe both curves for the same value of \(x\). However, a special situation occurs at the cutoff \(c\): this is the only point at which we “almost” observe both curves. To see this, we imagine having units with score exactly equal to \(c\), and units with score barely [above] \(c\) (that is, with score [\(c + \epsilon\) for a small and positive \(\epsilon\)). The former units would receive [control], and the latter would receive [treatment]. Yet if the values of the average potential outcomes at \(c\) are not abruptly different from their values at points near \(c\), the units with \(X_i = c\) and [\(X_i = c + \epsilon\)] would be very similar except for their treatment status, and we could approximately calculate the vertical distance at \(c\) using observed outcomes. (CIT 2019:11, adapted for this example)

This is exactly what the regression discontinuity design seeks to estimate. The quantity of interest is:

\(\mathbb{E}[Y_i(1)|X_i=c]- \mathbb{E}[Y_i(0)|X_i=c]\)

Note: This treatment is “local” and may be different from the treatment effect at other levels of the forcing variable.


Assumptions

In order to estimate this local effect, we make three assumptions: non-interference, continuity at the cut-point, and positivity.

Non-Interference

Individual \(i\)’s potential outcomes are only a function of \(i\)’s treatment status, and not the treatment status of any other individual \(j \neq i\). Formally, \(Y_i (Z_i,Z_{-i})= Y_i (Z_i, Z_{-i}^*)\).

This allows us to stipulate only two potential outcomes for each individual, \(Y_i (Z_i=1)\) and \(Y_i (Z_i=0)\).

Continuity at the Cut Point

Continuity of the conditional expectation functions of potential outcomes near the cut point.

Formally: \[\begin{equation} \lim_{\epsilon \rightarrow 0^+} \mathbb{E}[Y_i (1) | X_i = c+ \epsilon]= \lim_{\epsilon \rightarrow 0^+} \mathbb{E}[Y_i (1) | X_i = c - \epsilon] = \mathbb{E}[Y_i (1) | X_i = c] \end{equation}\] And \[\begin{equation} \lim_{\epsilon \rightarrow 0^+} \mathbb{E}[Y_i (0) | X_i = c+ \epsilon]= \lim_{\epsilon \rightarrow 0^+} \mathbb{E}[Y_i(0) | X_i = c - \epsilon] = \mathbb{E}[Y_i (0) | X_i = c] \end{equation}\]

In words, the average potential outcomes of those just below, just above, and at the cut point are very similar, and in fact the same as the immediate neighborhood, \([c - \epsilon, c+\epsilon]\), collapses on the cut point (\(\epsilon \rightarrow 0\)). Continuity asserts conditional independence of \(Z_i\) in that neighborhood by saying the groups on either side of the cut point are in expectation similar on observed and unobserved characteristics.

This assumption is essential for extrapolating \(\mathbb{E}[Y_i (1) | X_i = c]\) and \(\mathbb{E}[Y_i (0) | X_i = c]\). Let’s think about this visually:

Figure 1: Continuity at the Cutpoint

Figure 1: Continuity at the Cutpoint

Positivity

Positivity near the cut point. Formally: \[\begin{equation} \lim_{\epsilon \rightarrow 0^+} f[X_i = c + \epsilon] = \lim_{\epsilon \rightarrow 0^+} f[X_i = c - \epsilon] > 0 \end{equation}\]

Where \(f[\cdot]\) is the probability density function of \(X_i\). This requirement implies that there must be some units in the immediate neighborhood (i.e. just above and below the cut point). Furthermore, the density of units on either side of the cut-point should be similar. That is, there should be continuity of densities at the cut-point. The McCrary density test evaluates this by checking if the difference in densities at the cut-point is statistically significant. If there is a statistically significant difference, we might think there is “sorting around the cut-point” (i.e. units have precise control over their position vis-a-vis the cut-point).

One way to check for positivity is to plot the density of the forcing variable \(X_i\), and look for discontinuities at the cut-point. Using the example earlier, I produce such a plot:

ggplot(dat, aes(x = X)) + 
  geom_histogram(bins = 50, alpha = 0.3) +
  geom_vline(xintercept = 0, color = "darkblue", linetype = "dashed") +
  ylab("Count") +
  ggtitle("Density of the Forcing Variable X") +
  theme_economist_white()


RD Game: Each student takes a ballot paper. They randomly pick a value between -2 and +2, write it down on their ballot as \(noise_i\). Then, each student gets a seat number, \(\{1,2, \cdots, N\}\). They write \(Y_i(0) = 1.2\times \text{seat number}_i + \text{noise}_i\). And \(Y_i(1) = Y_i(0) + (0.5 \times \text{seat number}_i)\). This way, each student’s ballot has three quantities: \(\text{noise}_i\), \(Y_i(1)\), and \(Y_i(0)\). The instructor then arbitrarily divides the \(N\) students into a treatment and control group, such that \(Z_i = \mathbb{1}(\text{seat number}_i \geq c)\) (where \(c\) is the cut-point). We then discuss the essential ingredients, estimand, and assumptions of a regression discontinuity design.

Figure 2: RD Game For Different Seating Plans

Figure 2: RD Game For Different Seating Plans

Questions:

  1. What’s the forcing variable?

  2. Does \(Pr(Z_i = 1)\) discontinously change at the cut-point?

  3. What’s the estimand?

  4. What is the continuity assumption, and why do we need it?

  5. What is the positivity assumption?

  6. What is the non-interference assumption?

  7. What happens if there is sorting around the cut-point? Are potential outcomes still continuous, and is the density of the forcing variable on either side of the cut-point the same?

  8. If the goal is to predict the potential outcomes of a hypothetical person sitting between the \((c-1)^{th}\) person and \(c^{th}\) person, how does Kernel weighting and bandwidth selection help with estimation?

Estimation: Point v. Interval

In order to estimate a local effect using the regression discontinuity design, start by installing the package rdrobust.

Let’s start by discussing two separate components of estimation - point estimation of an effect, and statistical inference about the effect (such as building confidence intervals, or calculating \(t\) statistics). These are related exercises but have different objectives:

  • Point estimation seeks to minimize mean squared error, \(\mathbb{E}[(\mu_{LATE} - \widehat{\mu_{LATE}})^2]\)

  • Interval estimation seeks to minimize coverage error. Coverage refers to how often a confidence interval brackets the truth. The coverage error refers to the difference between a confidence interval’s stated/nominal coverage and actual coverage.

As we shall see, optimizing over one criterion does not necessarily imply optimality over the other criterion.


Point Estimation

The main objective of a regression discontinuity design is to accurately and precisely estimate regression functions at the cut-point. This involves the following steps:

  1. Pick a polynomial order \(p\) and weighting function \(K(\cdot)\)

  2. Choose a bandwidth \(h\)

  3. Estimate two regressions: for units above the cut-point (\(X_i \geq c\)), regress \(Y_i\) on \((X_i - c)\), \((X_i - c)^2\), \(\cdots\), \((X_i - c)^p\), with weights from \(K(\cdot)\); and similarly for units below the cut-point (\(X_i < c\)).

  4. The intercept from the first regression, \(\widehat{\mu_+}\), is a point estimate of \(\mathbb{E}[Y_i(1) | X_i = c]\). Similarly, the intercept from the second regression, \(\widehat{\mu_-}\), is a point estimate of \(\mathbb{E}[Y_i(0) | X_i = c]\).

  5. The point estimate, \(\widehat{\mu_{LATE}} = \widehat{\mu_+} - \widehat{\mu_-}\)

In this estimation process, the researcher makes three decisions: picking a polynomial \(p\), Kernel function \(K(\cdot)\) for weighting, and bandwidth \(h\). These decisions involve a bias-variance trade-off.

Polynomial Choice

If we pick a higher order polynomial, we can more accurately approximate the conditional expectation function but this also increases the variance the estimator. In other words, more flexible regression functions reduce bias but have greater variance. Selecting a higher \(p\) also comes at another cost: overfitting (or “undersmoothening”). I illustrate this using the earlier example:

# Polynomials 
fig3 <- 
  ggplot(dat, aes(x = X, y = Y, color = as.factor(Z))) +
  geom_point(data = dat, alpha = .1) +
  geom_smooth(method = "lm_robust") +
  geom_smooth(method = "lm_robust", formula = y ~ poly(x,3)) +
  geom_vline(xintercept = 0, linetype = "dotted") +
  geom_segment(aes(x = 0, xend = 0, y = make_Y0(0), yend = make_Y1(0)), color = "black") +
  xlab("Forcing Variable (X)") +
  ylab("Outcome (Y)") +
  ggtitle("Fitting a Linear and Polynomial Function", subtitle = "(Polynomial Selection and the Bias Variance Tradeoff)")+
  theme_economist_white()

fig3

Takeaway: Although it may seem at first that a linear polynomial is not flexible enough, an appropriately chosen bandwidth will adjust to the chosen polynomial order so that the linear approximation [\(p=1\)] to the unknown regression functions is reliable. (CIT 2019:44)

Kernel

A Kernel function, \(K(\cdot)\), assigns weights to each transformed observation \(\frac{X_i - c}{h}\), based on its distance from the cut-point. There are three common weighting schemes:

  1. Triangular Kernel: \(K(u) = (1 - |u|) \cdot \mathbb{1}(|u| \leq 1)\), where \(u = \frac{X_i - c}{h}\). This function assigns a weight of 1 to units exactly at the cut-point, and a weight of 0 to units outside the bandwidth \(h\) (since \(u > 1\) for all units with \(X_i\) outside \([c -h , c+h]\)). Further, the weight linearly decreases with distance from the cut-point.

  2. Uniform Kernel: \(K(u) = \mathbb{1}(|u| \leq 1)\), which equally weights all observations within the bandwidth, and a weight of 0 to units outside.

  3. Epanechnikov Kernel: \(K(u) = (1 - u^2) \cdot \mathbb{1}(|u| \leq 1)\), which gives a quadratic decaying weight to all observations within the bandwidth, and zero weight to observations outside the bandwidth.

Below is a plot of the weights at different levels of the forcing variable:

# Make weights

# Find MSE optimal bandwidth
bws_out <- with(dat, rdrobust::rdbwselect(y = Y, x = X))
h = bws_out$bws[1]

# Define different weights
dat <-
  dat %>%
  mutate(triangle_weights = if_else(abs(X) < h, 1- abs(X/h), 0),
         epanechnikov_weights= if_else(abs(X) < h, 1- (X/h)^2, 0),
         uniform_weights = if_else(abs(X) < h, 1, 0))

# Reshape dataframe for ggplot
weight_dat <- dat %>% 
  select(X, triangle_weights, epanechnikov_weights, uniform_weights) %>%
  gather(key = "Weight_Type", value = "Weights", -X)


# Plot
fig4 <- 
  ggplot(weight_dat, aes(x = X, y = Weights, color = Weight_Type)) +
  geom_point(alpha = .3) +
  geom_vline(xintercept = 0, linetype = "dotted") +
  xlab("Forcing Variable (X)") +
  ylab("Weights") +
  ggtitle("Kernel Function Choices")+
  theme_economist_white()

fig4

Takeaway: Cattaneo, Idrobo and Titiunik recommend the Triangular Kernel function. When this is used in conjunction with a “bandwidth that optimizes mean squared error (MSE), it leads to a point estimator with optimal properities” (CIT 2019:47)

Bandwidth

Fixing a polynomial \(p\), and kernel \(K(\cdot)\), we need to find the MSE optimal bandwidth. Again, there is a bias-variance trade-off here. A smaller bandwidth reduces bias but increases variance.

Choosing a smaller \(h\) will reduce misspecification error (also known as “smoothing bias”) of the local polynomial approximation, but will simultaneously tend to increase the variance of the estimated coefficients because fewer observations will be availbale for estimation. (CIT 2019:49)

To see this, focus on the bias and variance of the estimator \((\widehat{\mu_+} - \widehat{\mu_-})\)

\[\begin{equation} \mathfrak{B} = h^{2(p+1)} \cdot \mathbb{B} \end{equation}\]

Where \(\mathbb{B} = \mathbb{B_+} - \mathbb{B_-}\), \(\mathbb{B_+} \approx \mu_+^{(p+1)}\cdot B_-\) and \(\mathbb{B_-} \approx \mu_-^{(p+1)}\cdot B_-\). Note that \(B_+\) and \(B_-\) are known constants related to the Kernel function and order of polynomial used. Also, \(\mu_+^{(p+1)}\) and \(\mu_-^{(p+1)}\) are related to the “curvature” of the unknown conditional expectation functions (see CIT 2019:50).

\[\begin{equation} \mu_+^{(p+1)} = lim_{x \downarrow c} \frac{d^{p+1} \mathbb{E}[Y_i(1) | X_i = x]}{dx^{p+1}} \end{equation}\] \[\begin{equation} \mu_-^{(p+1)} = lim_{x \uparrow c} \frac{d^{p+1} \mathbb{E}[Y_i(0) | X_i = x]}{dx^{p+1}} \end{equation}\]

Crucially, \(\mathfrak{B} \propto h\) (i.e. bias increases with bandwidth)

Now, looking at the variance of the estimator

\[\begin{equation} \mathscr{V} = \frac{1}{nh}\mathbb{V} \end{equation}\]

Where \(\mathbb{V} = \mathbb{V_+} + \mathbb{V_-}\), \(\mathbb{V_+} \approx \frac{\sigma_+^2}{f} V_+\), and \(\mathbb{V_-} \approx \frac{\sigma_-^2}{f} V_-\). Here \(f\) denotes density of \(X_i\) at the cutpoint. \(V_+\) and \(V_-\) are known constants related to \(K(\cdot)\) and \(p\). And \(\sigma^2\) captures the conditional variation of potential outcomes at the cutpoint:

\[\begin{equation} \sigma_+^2 = lim_{x \downarrow c} Var[Y_i(1) | X_i = x] \end{equation}\] \[\begin{equation} \sigma_-^2 = lim_{x \uparrow c} Var[Y_i(0) | X_i = x] \end{equation}\]

Again, note that \(\mathscr{V} \propto \frac{1}{h}\) (i.e. variance decreases as \(h\) increases)


Takeaway: Cattaneo, Idrobo and Titiunik derive the following expression for the MSE-optimal bandwidth choice: \[\begin{equation} h_{MSE} = \left(\frac{\mathbb{V}}{2(p+1)\mathbb{B^2}}\right)^{\frac{1}{(2p+3)}} \cdot n^{\frac{1}{2p+3}} \end{equation}\]

In other words, the optimal bandwidth \(h_{MSE}\) increases with variance and decreases with bias


Interval Estimation

The Problem: We know that the RD estimator (\(\widehat{\tau_{RD}}\)) is biased . This means that any t-statistic that uses \(\widehat{\tau_{RD}}\) includes a bias term \(\mathfrak{B}\); and any confidence intervals constructed using the standard error ignores variability in the bias term.

To see this, say \(\widehat{\tau_{RD}} = \widehat{\tau_{RD, Unbiased}} + \mathfrak{B}\).

Then we know, \(\frac{\widehat{\tau_{RD}} - \tau_{RD}}{\sqrt{\mathscr{V}}}\) does not asymptotically converge to \(N(0,1)\)

And \(\mathscr{V} = Var(\widehat{\tau_{RD, Unbiased}} + \mathfrak{B}) \neq Var(\tau_{RD, Unbiased})\) unless \(\mathfrak{B}\) is a constant.

There are two ways around this:

Option 1: Use \(h_{MSE}\) but correct for the bias, and variability in the bias

Step 1: Estimate the bias, \(\widehat{\mathfrak{B}}\) and remove it from the RD point estimator. The resulting estimate is \(\widehat{\tau_{RD}} - \widehat{\mathfrak{B}} = (\widehat{\mu_+} - \widehat{\mu_-}) - \widehat{\mathfrak{B}}\)

Step 2: Adjust the variance estimate so it accounts for variability in the bias term. Call this \(\mathscr{V}_{bc}\).

Step 3: compute the t-statistic and confidence intervals using the point estimate and standard error:

\[\begin{equation} t = \frac{(\widehat{\tau_{RD}} -\widehat{\mathfrak{B}}) - \tau_{H0}}{\sqrt{\mathscr{V}_{bc}}} \end{equation}\] \[\begin{equation} CI_{rbc} = (\widehat{\tau_{RD}} -\widehat{\mathfrak{B}}) \pm 1.96 \cdot \sqrt{\mathscr{V}_{bc}} \end{equation}\]

Option 2: Use a different bandwidth that minimizes coverage error

An alternative approach to RD inference is to decouple the goal of point estimation from the goal of inference, using different bandwidth for each case. In particular, this strategy involves estimating the RD effect with $h_{MSE}, and constructing confidence intervals using a different bandwidth, where the latter is specifically chosen to minimize an approximation to the coverage error (CER) of the confidence interval \(CI_{rbc}\), leading to the choice \(h = h_{CER}\). Just like \(h_{MSE}\) minimizes the asymptotic MSE of the point estimator \(\widehat{\tau_{RD}}\), the CER-optimal bandwidth \(h_{CER}\) minimizes the asymptotic coverage error rate (CIT 2019:75)


rdrobust

rdrobust puts all these parts together, and is the best way to estimate a regression discontinuity design. We can specify the model in the following way:

rdrobust(y, x, c, kernel, p, bwselect)

Where, \(y\) is the outcome of interest, \(x\) is the forcing variable, \(c\) is the cutoff (default \(c=0\)), kernel specifies the Kernel function (“triangular”, “uniform”, or “epanechnikov”),\(p\) specifies the order of the polynomial. As for bandwidth, it can be manually set by inputting h, or automatically determined using bwselect. We have the following options in bwselect:

  • mserd gives one common MSE-optimal bandwidth

  • msetwo gives two different MSE-optimal bandwidths (one right of the cut-point, the other left of the cutpoint)

  • cerrd gives one common CER-optimal bandwidth

  • certwo gives two different CER-optimal bandwidths (one right of the cut-point, the other left of the cutpoint)

# RD with local linear regression, triangular kernel weighting, and MSE optimal bandwidths
library(rdrobust)

out <- rdrobust(dat$Y, dat$X, kernel = "triangular", p = 1, bwselect = "mserd")

summary(out)
## Call: rdrobust
## 
## Number of Obs.                 1000
## BW type                       mserd
## Kernel                   Triangular
## VCE method                       NN
## 
## Number of Obs.                 503         497
## Eff. Number of Obs.            136         146
## Order est. (p)                   1           1
## Order bias  (p)                  2           2
## BW est. (h)                  0.268       0.268
## BW bias (b)                  0.438       0.438
## rho (h/b)                    0.612       0.612
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional     1.586     0.090    17.544     0.000     [1.408 , 1.763]     
##         Robust         -         -    14.945     0.000     [1.349 , 1.756]     
## =============================================================================
# Above, but with bias correction and robust standard errors (Option 1)

out2 <- rdrobust(dat$Y, dat$X, kernel = "triangular", p = 1, bwselect = "mserd", all = TRUE)

summary(out2)
## Call: rdrobust
## 
## Number of Obs.                 1000
## BW type                       mserd
## Kernel                   Triangular
## VCE method                       NN
## 
## Number of Obs.                 503         497
## Eff. Number of Obs.            136         146
## Order est. (p)                   1           1
## Order bias  (p)                  2           2
## BW est. (h)                  0.268       0.268
## BW bias (b)                  0.438       0.438
## rho (h/b)                    0.612       0.612
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional     1.586     0.090    17.544     0.000     [1.408 , 1.763]     
## Bias-Corrected     1.553     0.090    17.180     0.000     [1.375 , 1.730]     
##         Robust     1.553     0.104    14.945     0.000     [1.349 , 1.756]     
## =============================================================================
# Above, but with CI's chosen from the coverage error optimal bandwidth (Option 2)

out3 <- rdrobust(dat$Y, dat$X, kernel = "triangular", p = 1, bwselect = "cerrd")

summary(out3)
## Call: rdrobust
## 
## Number of Obs.                 1000
## BW type                       cerrd
## Kernel                   Triangular
## VCE method                       NN
## 
## Number of Obs.                 503         497
## Eff. Number of Obs.             92          92
## Order est. (p)                   1           1
## Order bias  (p)                  2           2
## BW est. (h)                  0.190       0.190
## BW bias (b)                  0.438       0.438
## rho (h/b)                    0.433       0.433
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional     1.509     0.105    14.420     0.000     [1.304 , 1.714]     
##         Robust         -         -    13.427     0.000     [1.276 , 1.713]     
## =============================================================================

Exercise

Please download the .csv file I have uploaded on Canvas, and my GitHub page. This is the data analyzed by Meyersson (2014) in their paper about the effect of Islamic political representation on women’s education. This is a “close-election” RD design: the forcing variable \(X_i\) is the Islamic party’s vote margin in a locality, and \(Y_i\) captures young women’s educational attainments in that locality (percentage of women aged 15-20 who had completed high school). This is a “sharp RD” because treatment, \(Z_i\), is the electoral victory of the Islamic party, \(Z_i = \mathbb{1}(X_i \geq 0)\). The probability of having a mayor from the Islamic party changes discontinuously at the cut-point \(X_i = 0\).

With this in mind, can you estimate RD’s with the following specifications:

  1. Manually set the bandwidth to 18 for localities below the cut-point, and 22 for those above the cut-point.

  2. Instead of a local linear regression, fit a quadratic regression function on either side of the cut-point.

  3. Instead of kernel = “triangular”, use the Epanechnikov weighting procedure.

  4. What is the bias corrected point estimate?

  5. What is the confidence interval when we select a bandwidth that minimizes coverage error? Let rdrobust pick separate bandwidths for each side of the cut-point.

Answers:

turkey_mayors <- read_csv("CIT2019_TurkeyMayor.csv")

#1
with(turkey_mayors, rdrobust(Y, X, kernel = "triangular", p = 1, h = c(18,22))) %>% summary()
## Call: rdrobust
## 
## Number of Obs.                 2629
## BW type                      Manual
## Kernel                   Triangular
## VCE method                       NN
## 
## Number of Obs.                2314         315
## Eff. Number of Obs.            548         288
## Order est. (p)                   1           1
## Order bias  (p)                  2           2
## BW est. (h)                 18.000      22.000
## BW bias (b)                 18.000      22.000
## rho (h/b)                    1.000       1.000
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional     2.990     1.341     2.230     0.026     [0.362 , 5.618]     
##         Robust         -         -     1.154     0.248    [-1.550 , 5.988]     
## =============================================================================
#2
with(turkey_mayors, rdrobust(Y, X, kernel = "triangular", p = 2, bwselect = "mserd")) %>% summary()
## Call: rdrobust
## 
## Number of Obs.                 2629
## BW type                       mserd
## Kernel                   Triangular
## VCE method                       NN
## 
## Number of Obs.                2314         315
## Eff. Number of Obs.            702         291
## Order est. (p)                   2           2
## Order bias  (p)                  3           3
## BW est. (h)                 23.120      23.120
## BW bias (b)                 35.190      35.190
## rho (h/b)                    0.657       0.657
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional     2.772     1.808     1.533     0.125    [-0.772 , 6.315]     
##         Robust         -         -     1.325     0.185    [-1.276 , 6.601]     
## =============================================================================
#3
with(turkey_mayors, rdrobust(Y, X, kernel = "epanechnikov", p = 1, bwselect = "mserd")) %>% summary()
## Call: rdrobust
## 
## Number of Obs.                 2629
## BW type                       mserd
## Kernel                   Epanechnikov
## VCE method                       NN
## 
## Number of Obs.                2314         315
## Eff. Number of Obs.            502         256
## Order est. (p)                   1           1
## Order bias  (p)                  2           2
## BW est. (h)                 16.276      16.276
## BW bias (b)                 27.923      27.923
## rho (h/b)                    0.583       0.583
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional     3.217     1.419     2.268     0.023     [0.437 , 5.998]     
##         Robust         -         -     1.912     0.056    [-0.080 , 6.458]     
## =============================================================================
#4
with(turkey_mayors, rdrobust(Y, X, kernel = "epanechnikov", p = 1, bwselect = "mserd", all = TRUE)) %>% summary()
## Call: rdrobust
## 
## Number of Obs.                 2629
## BW type                       mserd
## Kernel                   Epanechnikov
## VCE method                       NN
## 
## Number of Obs.                2314         315
## Eff. Number of Obs.            502         256
## Order est. (p)                   1           1
## Order bias  (p)                  2           2
## BW est. (h)                 16.276      16.276
## BW bias (b)                 27.923      27.923
## rho (h/b)                    0.583       0.583
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional     3.217     1.419     2.268     0.023     [0.437 , 5.998]     
## Bias-Corrected     3.189     1.419     2.248     0.025     [0.409 , 5.969]     
##         Robust     3.189     1.668     1.912     0.056    [-0.080 , 6.458]     
## =============================================================================
#5
with(turkey_mayors, rdrobust(Y, X, kernel = "epanechnikov", p = 1, bwselect = "certwo")) %>% summary()
## Call: rdrobust
## 
## Number of Obs.                 2629
## BW type                      certwo
## Kernel                   Epanechnikov
## VCE method                       NN
## 
## Number of Obs.                2314         315
## Eff. Number of Obs.            383         208
## Order est. (p)                   1           1
## Order bias  (p)                  2           2
## BW est. (h)                 12.518      11.088
## BW bias (b)                 31.455      29.112
## rho (h/b)                    0.398       0.381
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional     2.861     1.625     1.761     0.078    [-0.324 , 6.046]     
##         Robust         -         -     1.629     0.103    [-0.579 , 6.284]     
## =============================================================================