'trying to make a ggplot with two lines

data can be found at: https://www.kaggle.com/tovarischsukhov/southparklines

SP = read.csv("/Users/michael/Desktop/stat 479 proj data/All-seasons.csv")
SP$Season = as.numeric(SP$Season)
SP$Episode = as.numeric(SP$Episode)

Clean.Boys = SP  %>% select(Season, Episode, Character) %>% 
  arrange(Season, Episode, Character) %>% 
  filter(Character == "Kenny"   | Character == "Cartman") %>% 
  group_by(Season, Episode) 

count = table(Clean.Boys)
count = as.data.frame(count)
Clean = count %>% pivot_wider(names_from = Character, values_from = Freq) %>% group_by(Episode)

Season Episode Cartman Kenny
  <fct>  <fct>     <int> <int>
1 1      1            85     5
2 2      1             1     0
3 3      1            43    19
4 4      1            83     6
5 5      1            37     3
6 6      1            67     0

I am trying to use ggplot to make a single plot with 2 lines on it one for the Cartman variable and one for the Kenny variable. My two questions are

  1. is my data formated correctly to make a plot with geom_line()? or would I have to Pivot it longer?

  2. I want to plot the X-scale as a continuous variable, similar to date but instead, it is season and episode. For example the first plotting point would be Season 1 Episode 1 then Season 1 Episode 2 and so on. I am stuck on how I would be able to do that with season and Episode being in separate columns and even if I combined them I'm not sure what the proper format would be.



Solution 1:[1]

The trick is to gather the columns you want to map as variables. As I don't know, how you want to plot your graph, means, about x-axis and y-axis, I made a pseudo plot. and for your continuous variable part, you can either convert your values to integer or numeric using as.integer() or as.numeric(), then you can use as continuous scale. You can check your variable structure by calling str(df), which will show you the class of your variable, if it is in factor or character, convert them to numbers.

#libraries
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.0.5
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
#> Warning: package 'tidyr' was built under R version 4.0.3

#your code
SP <- read.csv("C:/Users/saura/Desktop/All-seasons.csv")
SP$Season = as.numeric(SP$Season)
#> Warning: NAs introduced by coercion
SP$Episode = as.numeric(SP$Episode)
#> Warning: NAs introduced by coercion

Clean.Boys = SP  %>% select(Season, Episode, Character) %>% 
  arrange(Season, Episode, Character) %>% 
  filter(Character == "Kenny"   | Character == "Cartman") %>% 
  group_by(Season, Episode) 
count = table(Clean.Boys)
count = as.data.frame(count)
Clean = count %>% pivot_wider(names_from = Character, values_from = Freq) %>% group_by(Episode)

#here is your code, but as I dont know, what you want on your axis
new_df <- Clean %>%
  gather(-Season,-Episode, key = "Views", value = "numbers")

ggplot(data = new_df, aes(
  as.numeric(Episode),
  numbers,
  color = Views,
  group = Views
)) +
  geom_path()

Created on 2022-02-19 by the reprex package (v2.0.1)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Saurav Das