'R studio : Create an ID column in which integers will be incremented and change gradually (from 1 to n) using the mutate fonction from tidyverse
I have a dataframe in which I have many observations of different taxa. I need to add a column in this dataframe in which I'll find an ID composed by numbers from 1 to [n(taxa)] : to illustrate, here's an example of my dataframe :
taxa; station_nom; x; y; density_m²;
Anax; station_1; x1; y1; 26;
Anax; station_2; x2; y2; 38;
Anopheles; station_1; x1; y1; 3;
Anopheles; station_2; x2; y2; 12;
Atrichopogon; station_3; x3; y3; 89;
[...]
And I would like to add a new column named "CODE" which should refers a fictional ID number for each taxon from 1 to the number of taxa :
taxa; station_nom; x; y; density_m²; CODE;
Anax; station_1; x1; y1; 26; 1;
Anax; station_2; x2; y2; 38; 1;
Anopheles; station_1; x1; y1; 3; 2;
Anopheles; station_2; x2; y2; 12; 2;
Atrichopogon; station_3; x3; y3; 89; 3;
I need that all the "Anax" taxa have the same CODE (here 1), and all the "Anopheles" taxa have the [Anax CODE] +1, etc...
I tried different things but the most accurate is probably the fonction "Mutate" from tidyverse. Here's one of the things i tried, which works fine in other dataframes (in which i have 1 observation per taxa). In my actual case, I have several observations for the same taxon.
Obs_emb<- BDD %>%
group_by(embranchement_phylum_2, station_nom, x, y) %>%
summarise(densite_m2 = round(mean(densite_par_m2)))
Obs_emb<- dplyr::mutate(Obs_emb, CODE = row_number())
This code add a new column named "CODE" but there's no incrementation.
I think it could be interesting to try some loops based on the difference between the names of all taxa... but my knowledge stops here.
Can anyone help me ?
Solution 1:[1]
Or this with dplyr >= 1.0.0
df <- tibble(val = c(1,2,3,4),
group = c("a", "a", "b", "b")
)
gf <- group_by(df, group)
mutate(gf, ID = cur_group_id())
#> # A tibble: 4 x 3
#> # Groups: group [2]
#> val group ID
#> <dbl> <chr> <int>
#> 1 1 a 1
#> 2 2 a 1
#> 3 3 b 2
#> 4 4 b 2
Solution 2:[2]
Try this ...
library(tidyverse)
tibble(taxa = c("a", "a", "b", "c"), value = 1:4) |>
nest(data = -taxa) |>
mutate(code = row_number()) |>
unnest(cols = c(data))
#> # A tibble: 4 × 3
#> taxa value code
#> <chr> <int> <int>
#> 1 a 1 1
#> 2 a 2 1
#> 3 b 3 2
#> 4 c 4 3
Created on 2022-04-27 by the reprex package (v2.0.1)
library(tidyverse)
coded <- tibble(taxa = c(rep("a", 100), rep("b", 10), rep("c", 10)), value = 1:120) |>
nest(data = -taxa) |>
mutate(code = row_number()) |>
unnest(cols = c(data))
coded |> count(code)
#> # A tibble: 3 × 2
#> code n
#> <int> <int>
#> 1 1 100
#> 2 2 10
#> 3 3 10
Created on 2022-04-27 by the reprex package (v2.0.1)
Solution 3:[3]
# Packages
library("tibble")
library("dplyr")
# Data
aa <- tibble(val = c(1,2,3,4), group = c("a", "a", "b", "b"))
# Use either Base R Pipe or Magrittr
aa |> mutate(x = match(group, unique(group)))
# A tibble: 4 x 3
# val group x
# <dbl> <chr> <int>
#1 1 a 1
#2 2 a 1
#3 3 b 2
#4 4 b 2
Solution 4:[4]
So i finally managed to reach my objective by creating a tibble in which I added my ID column, and by paste this new column in my original observation dataframe.
By going through several tweaks, I get my results using the following code:
# creation of a table containing taxon name / station name / station coordinates / average density per m² on the stations
Obs_emb<- BDD %>%
group_by(embranchement_phylum_2, station_nom, x, y) %>%
summarise(densite_m2 = round(mean(densite_par_m2)))
# creation of CODE column in another tibble allowing the identification of taxa in the atlas parameters of QGIS
Tib_emb <- tibble(val = c(1:636), embranchement_phylum_2 = c((rep("Annelida",122)), (rep("Arthropoda",132)), (rep("Cnidaria",40)), (rep("Mollusca",124)), (rep("Nematoda",30)), (rep("Nemertea",77)), (rep("Platyhelminthes",78)), (rep("Porifera",1)), (rep("Xenacoelomorpha",31)), (rep("NA",1))))
Tib_emb <- mutate(Tib_emb, CODE = match(embranchement_phylum_2, unique(embranchement_phylum_2)))
# Paste columns from Tib_emb to Obs_emb
Obs_emb <- bind_cols(Obs_emb,Tib_emb)
# rearrange the resulting table by deleting unnecessary columns (duplicate taxa and val column)
Obs_emb <- select(Obs_emb,-c(embranchement_phylum_2...7,val))
# Rename the taxon column for compatibility with the database
names(Obs_emb)[names(Obs_emb) == 'embranchement_phylum_2...1'] <- 'embranchement_phylum_2'
I know this may not be the most optimized way to get what i need but it works and it's fine for me.
I sincerely thank all of you for your help and wish you a good continuation!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Micha? Kami?ski |
| Solution 2 | |
| Solution 3 | Shivam7898 |
| Solution 4 | Falcxn |
