'How to add percentile (/quantile) values to a column in dataframe
My data set has flow rate measurements of a river for every day of the year from 2009 to 2021. This is split up into seasons: Winter (December, Jan, Feb), Spring (March, April, May), Summer (June, July, August) and Autumn (September, October, November).
This is a sample of my data set:
> (chitt_brook_wylye_2)
# A tibble: 4,437 x 7
river year season month date flow_rate quality
<chr> <dbl> <chr> <chr> <dttm> <dbl> <chr>
1 chittern_brook 2009 Winter December 2009-12-01 00:00:00 0.059 Good
2 chittern_brook 2009 Winter December 2009-12-02 00:00:00 0.061 Good
3 chittern_brook 2009 Winter December 2009-12-03 00:00:00 0.064 Good
4 chittern_brook 2009 Winter December 2009-12-04 00:00:00 0.068 Good
5 chittern_brook 2009 Winter December 2009-12-05 00:00:00 0.076 Good
6 chittern_brook 2009 Winter December 2009-12-06 00:00:00 0.138 Good
7 chittern_brook 2009 Winter December 2009-12-07 00:00:00 0.592 Good
8 chittern_brook 2009 Winter December 2009-12-08 00:00:00 1.04 Good
9 chittern_brook 2009 Winter December 2009-12-09 00:00:00 1.46 Good
10 chittern_brook 2009 Winter December 2009-12-10 00:00:00 1.7 Good
# ... with 4,427 more rows
I want to find the 95th percentile, 5th percentile, median and the mean of each season of every year and have the values for 95th 5th, median and mean in separate columns in a new dataframe.
For example:
> (df)
# A tibble: 49 x 2
season_label flow_rate_mean Q95 Q5 flow_rate_median
<chr> <dbl>
1 Winter 2009 0.453 3 2 4
2 Spring 2010 0.519 6 3 4
3 Summer 2010 0.0627 4 3 6
4 Autumn 2010 0.0415 6 2 6
5 Winter 2010 0.0622 8 3 3
6 Spring 2011 0.188 10 3 2
7 Summer 2011 0.0499 2 3 2
8 Autumn 2011 0.0383 2 2 1
9 Winter 2011 0.0461 5 2 7
10 Spring 2012 0.0925 3 2 8
# ... with 39 more rows
I currently have this code which creates the above dataframe with just the first two columns but I would like it to also include 95th percentile, 5th percentile and median. Is this feasible or will I need to do it separately and then combine it into one dataframe?
df <- chitt_brook_wylye_2 %>%
dplyr::mutate(month = as.numeric(format(date,"%m")),
year = as.numeric(format(date,"%Y")),
season_id = (12*year + month) %/% 3) %>%
dplyr::group_by(season_id) %>%
dplyr::mutate(season_label = paste(season, min(year))) %>%
dplyr::group_by(season_id,season_label) %>%
dplyr::summarise(flow_rate = mean(flow_rate))
Reproducible example and code:
date <- as.Date(c("2009-12-01","2010-01-01","2010-02-01","2010-03-01","2010-04-01","2010-05-01","2010-06-01","2010-07-01","2010-08-01","2010-09-01","2010-10-01","2010-11-01","2010-12-01"))
season <- c("Winter","Winter","Winter","Spring","Spring","Spring","Summer","Summer","Summer","Autumn","Autumn","Autumn","Winter")
var <- c(1,2,3,5,5,5,7,7,7,9,9,9,10)
df <- data.frame(date,season,var) %>% # creating the dataframe
dplyr::mutate(month = as.numeric(format(date,"%m")),
year = as.numeric(format(date,"%Y")),
season_id = (12*year + month) %/% 3) %>% #generating an identifiant for every season that exists in the data
dplyr::group_by(season_id) %>% # Grouping by the id
dplyr::mutate(season_label = paste(min(year),season)) %>%
dplyr::group_by(season_id,season_label) %>% ## season_label to keep the newly created label after the arriving summarise
dplyr::summarise(var = mean(var)) # Computing the mean
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
