'Plotting RDD with Binary outcome variabble

I am trying to create a Regression Discontinuity graph in ggplot, and I am unable to do so after many tries.

I have a dataset that looks like this:

   distance affiliation treatment
      <dbl>       <dbl>     <dbl>
 1      -14           1         0
 2      -13           0         0
 3      -16           0         0
 4       19           0         0
 5       16           0         0
 6       14           0         0
 7       0            0         1
 8      -27           0         0
 9      -14           0         0
10       12           0         0
11      -14           1         0
12      -13           0         0
13      -16           0         0
14       19           0         0
15       0            1         0
16       14           0         0
17       0            1         1
18      -27           0         0
19      -14           0         0
20       0            0         0

Distance is a variable that tells me how many years before and after the cutoff (zero). Affiliation is my outcome variable, that tells me whether an individual is a member of a political party or not. Treatment is a binary variable that tells me whether someone received treatment (denoted as 0 in the distance variable).

I am trying to make a regression discontinuity graph. I basically need the graph to look like this:

What I want

Where my y-axis will be my affiliation, my x-axis will be distance.

I have been trying many combinations, and I have only gotten this far:

ggplot(sample, aes(distance, affiliation, color = factor(treatment))) +
  geom_point() + stat_smooth(type="lm") +
  geom_vline(xintercept=0, linetype="longdash") +
  xlab("Running variable") +
  ylab("Outcome variable") + scale_colour_discrete() 

What I have

I have tried many other options, but nothing gives me the two fits I need bebfore and after the cutoff. Any ideas?

Thank you!



Solution 1:[1]

To get that graph, I would create a column that which groups by distance:

library(tidyverse)

df <- structure(list(distance = c(-14, -13, -16, 19, 16, 14, 0, -27, 
                                  -14, 12, -14, -13, -16, 19, 0, 14,
                                  0, -27, -14, 0),
                     affiliation = c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
                                     0, 0, 0, 1, 0, 1, 0, 0, 0),
                     treatment = c(0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
                                   0, 0, 0, 0, 1, 0, 0, 0)),
                class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L))

df %>% 
  mutate(cutoff = if_else(distance <= 0, "pre", "post")) %>% 
  ggplot(aes(distance, affiliation, color = cutoff)) +
  geom_point() +
  geom_smooth(method = "lm")
#> `geom_smooth()` using formula 'y ~ x'

I feel obligated to point out that logistic regression seems like a much better way of doing what you're doing.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 philiptomk