'Need help plotting the count of two variables in a scatterplot and then fitting the line in R

I need help with all these questions, but specifically plotting the scatterplot and fitting the linear regression model.

  • Filter out any zip code where the number of emergency visits was less than 20
  • Plot the Count of influenza-like illness and/or pneumonia visits against Count of all emergency department visits
  • Plot the line of best fit (linear regression) and the R-squared
  • From the some.zips data set, aggregate the mean of ED visits by zip code.

Here is my code, but it is not working. I keep getting "Warning in abline(m) : only using the first two of 135 regression coefficients". Can someone help? Code below. Also, here is the dataset :

fromJSON("https://data.cityofnewyork.us/resource/2nwg-uqyg.json")

library(jsonlite)
library(tidyverse)
library(ALSM)
data(package="ALSM")

filtered_data = filter(er, emergency.visits > 20)

plot(ili_pne_visits~total_ed_visits,data=filtered_data,xlab="Total ER Visits",ylab="Influenza Visits")

m <-lm(ili_pne_visits~total_ed_visits,data=filtered_data)

abline(m)


Solution 1:[1]

code-wise, this will do the job:

df <- fromJSON("https://data.cityofnewyork.us/resource/2nwg-uqyg.json")
    
df %>%
    ## convert variables from character to numeric where appropriate:
    mutate(across(mod_zcta:ili_pne_admissions, ~ as.integer(.x))) %>%
    filter(total_ed_visits > 20) %>%
    ggplot(aes(x = total_ed_visits, y = ili_pne_admissions)) +
    geom_point() +
    ## add regression line and confidence band
    geom_smooth(method = 'lm')

However, pouring the data indiscriminately into one scatterplot/linear model hides interesting patterns - e.g. seasonality. Plotting the share of ili_pne to total visits against time, voila!

library(lubridate) ## for easy date-time-manipulation

df %>%
    ## convert variables from character to numeric where appropriate:
    mutate(
        across(mod_zcta:ili_pne_admissions, ~ as.integer(.x)),
        date = lubridate::as_datetime(date),
        ili_pne_share = ili_pne_visits / total_ed_visits
        ) %>% 
    filter(total_ed_visits > 20) %>%
    arrange(date) %>%
    ggplot(aes(x = date, y = ili_pne_share)) + 
    geom_line() +
    geom_smooth(span = .1)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 I_O