'Adding a column of totals using dplyr in a dataframe
Solution 1:[1]
Use add_count
library(dplyr)
starwars %>%
add_count(species, name = "Total_species")
add_count: does the same as: (note mutate)
starwars %>%
group_by(species) %>%
mutate(n = n())
in contrast:
count: does the following: (note summarise)
starwars %>%
group_by(species) %>%
summarise(n = n())
name height mass hair_color skin_color eye_color birth_year sex gender homeworld species films vehicles starships Total_species
<chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <lis> <list> <list> <int>
1 Luke Sk~ 172 77 blond fair blue 19 male mascu~ Tatooine Human <chr> <chr> <chr [2]> 35
2 C-3PO 167 75 NA gold yellow 112 none mascu~ Tatooine Droid <chr> <chr> <chr [0]> 6
3 R2-D2 96 32 NA white, bl~ red 33 none mascu~ Naboo Droid <chr> <chr> <chr [0]> 6
4 Darth V~ 202 136 none white yellow 41.9 male mascu~ Tatooine Human <chr> <chr> <chr [1]> 35
5 Leia Or~ 150 49 brown light brown 19 fema~ femin~ Alderaan Human <chr> <chr> <chr [0]> 35
6 Owen La~ 178 120 brown, gr~ light blue 52 male mascu~ Tatooine Human <chr> <chr> <chr [0]> 35
7 Beru Wh~ 165 75 brown light blue 47 fema~ femin~ Tatooine Human <chr> <chr> <chr [0]> 35
8 R5-D4 97 32 NA white, red red NA none mascu~ Tatooine Droid <chr> <chr> <chr [0]> 6
9 Biggs D~ 183 84 black light brown 24 male mascu~ Tatooine Human <chr> <chr> <chr [1]> 35
10 Obi-Wan~ 182 77 auburn, w~ fair blue-gray 57 male mascu~ Stewjon Human <chr> <chr> <chr [5]> 35
Solution 2:[2]
In the case where all you need is the count of the number of individuals/records within a group you can use dplyr::count(). Note that sort=TRUE will sort counts in decreasing order, contrary to the default behavior of sort functions in base R.
starwars %>%
count(species, sort = TRUE)
This is shorthand for something like what is below where you summarize the number of rows in each group. After counting the rows in each group I arrange() them in decreasing (desc()) order of n.
library(dplyr)
starwars %>%
group_by(species) %>%
summarize(n = n()) %>%
arrange(desc(n))
More generally in the place of count() you can use summarize() and n() if for instance you need to do some other calculation with the counts or summarize some other data elements.
Here with summarize() I divide the count by the number of rows in the original dataset to make a proportion a column named foo
library(dplyr)
starwars %>%
group_by(species) %>%
summarize(foo = n() / nrow(starwars)) %>%
arrange(desc(foo))
Solution 3:[3]
Loading in the dataset.
library(dplyr)
head(starwars)
What we want to do is to clean up some of the NAs in general. This is optional, but I am doing it in this case. We will also create a tibble containing the counts of each species.
starwars_clean <- starwars %>% na.omit()
species_counts <- starwars_clean %>%
count(species) %>%
mutate(species_count = n)
Afterwards, we want to merge the two tibbles (something like a left join in a relational database) based on the species column. A left join is denoted by the all.x = TRUE argument.
joined_tibble <- merge(
x = starwars_clean,
y = species_counts,
by = "species",
all.x = TRUE
)
head(joined_tibble)
We will have the species_count column for each row of the starwars tibble.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | TarJae |
| Solution 2 | |
| Solution 3 | vincentleoooo |

