'Plotting RDD with Binary outcome variabble
I am trying to create a Regression Discontinuity graph in ggplot, and I am unable to do so after many tries.
I have a dataset that looks like this:
distance affiliation treatment
<dbl> <dbl> <dbl>
1 -14 1 0
2 -13 0 0
3 -16 0 0
4 19 0 0
5 16 0 0
6 14 0 0
7 0 0 1
8 -27 0 0
9 -14 0 0
10 12 0 0
11 -14 1 0
12 -13 0 0
13 -16 0 0
14 19 0 0
15 0 1 0
16 14 0 0
17 0 1 1
18 -27 0 0
19 -14 0 0
20 0 0 0
Distance is a variable that tells me how many years before and after the cutoff (zero). Affiliation is my outcome variable, that tells me whether an individual is a member of a political party or not. Treatment is a binary variable that tells me whether someone received treatment (denoted as 0 in the distance variable).
I am trying to make a regression discontinuity graph. I basically need the graph to look like this:
Where my y-axis will be my affiliation, my x-axis will be distance.
I have been trying many combinations, and I have only gotten this far:
ggplot(sample, aes(distance, affiliation, color = factor(treatment))) +
geom_point() + stat_smooth(type="lm") +
geom_vline(xintercept=0, linetype="longdash") +
xlab("Running variable") +
ylab("Outcome variable") + scale_colour_discrete()
I have tried many other options, but nothing gives me the two fits I need bebfore and after the cutoff. Any ideas?
Thank you!
Solution 1:[1]
To get that graph, I would create a column that which groups by distance:
library(tidyverse)
df <- structure(list(distance = c(-14, -13, -16, 19, 16, 14, 0, -27,
-14, 12, -14, -13, -16, 19, 0, 14,
0, -27, -14, 0),
affiliation = c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 1, 0, 1, 0, 0, 0),
treatment = c(0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L))
df %>%
mutate(cutoff = if_else(distance <= 0, "pre", "post")) %>%
ggplot(aes(distance, affiliation, color = cutoff)) +
geom_point() +
geom_smooth(method = "lm")
#> `geom_smooth()` using formula 'y ~ x'

I feel obligated to point out that logistic regression seems like a much better way of doing what you're doing.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | philiptomk |


