'R studio : Create an ID column in which integers will be incremented and change gradually (from 1 to n) using the mutate fonction from tidyverse

I have a dataframe in which I have many observations of different taxa. I need to add a column in this dataframe in which I'll find an ID composed by numbers from 1 to [n(taxa)] : to illustrate, here's an example of my dataframe :


taxa;          station_nom;     x;        y;        density_m²;

Anax;          station_1;       x1;       y1;          26;
Anax;          station_2;       x2;       y2;          38; 
Anopheles;     station_1;       x1;       y1;          3; 
Anopheles;     station_2;       x2;       y2;          12;
Atrichopogon;  station_3;       x3;       y3;          89;
[...]

And I would like to add a new column named "CODE" which should refers a fictional ID number for each taxon from 1 to the number of taxa :

taxa;          station_nom;     x;        y;        density_m²;       CODE;

Anax;          station_1;       x1;       y1;          26;             1;
Anax;          station_2;       x2;       y2;          38;             1;
Anopheles;     station_1;       x1;       y1;          3;              2;
Anopheles;     station_2;       x2;       y2;          12;             2;  
Atrichopogon;  station_3;       x3;       y3;          89;             3;

I need that all the "Anax" taxa have the same CODE (here 1), and all the "Anopheles" taxa have the [Anax CODE] +1, etc...

I tried different things but the most accurate is probably the fonction "Mutate" from tidyverse. Here's one of the things i tried, which works fine in other dataframes (in which i have 1 observation per taxa). In my actual case, I have several observations for the same taxon.

Obs_emb<- BDD %>%
  group_by(embranchement_phylum_2, station_nom, x, y) %>%
  summarise(densite_m2 = round(mean(densite_par_m2))) 
Obs_emb<- dplyr::mutate(Obs_emb, CODE = row_number())

This code add a new column named "CODE" but there's no incrementation.

I think it could be interesting to try some loops based on the difference between the names of all taxa... but my knowledge stops here.

Can anyone help me ?



Solution 1:[1]

Or this with dplyr >= 1.0.0

df <- tibble(val = c(1,2,3,4),
             group = c("a", "a", "b", "b")
             )

gf <- group_by(df, group)

mutate(gf, ID = cur_group_id())

#> # A tibble: 4 x 3
#> # Groups:   group [2]
#>     val group    ID
#>   <dbl> <chr> <int>
#> 1     1 a         1
#> 2     2 a         1
#> 3     3 b         2
#> 4     4 b         2

Solution 2:[2]

Try this ...

library(tidyverse)

tibble(taxa = c("a", "a", "b", "c"), value = 1:4) |> 
  nest(data = -taxa) |> 
  mutate(code = row_number()) |> 
  unnest(cols = c(data))
#> # A tibble: 4 × 3
#>   taxa  value  code
#>   <chr> <int> <int>
#> 1 a         1     1
#> 2 a         2     1
#> 3 b         3     2
#> 4 c         4     3

Created on 2022-04-27 by the reprex package (v2.0.1)

library(tidyverse)

coded <- tibble(taxa = c(rep("a", 100), rep("b", 10), rep("c", 10)), value = 1:120) |> 
  nest(data = -taxa) |> 
  mutate(code = row_number()) |> 
  unnest(cols = c(data))

coded |> count(code)
#> # A tibble: 3 × 2
#>    code     n
#>   <int> <int>
#> 1     1   100
#> 2     2    10
#> 3     3    10

Created on 2022-04-27 by the reprex package (v2.0.1)

Solution 3:[3]

# Packages 
library("tibble")
library("dplyr")
# Data
aa <- tibble(val = c(1,2,3,4), group = c("a", "a", "b", "b"))

# Use either Base R Pipe or Magrittr 
aa |> mutate(x = match(group, unique(group)))
# A tibble: 4 x 3
#    val group     x
#  <dbl> <chr> <int>
#1     1 a         1
#2     2 a         1
#3     3 b         2
#4     4 b         2

Solution 4:[4]

So i finally managed to reach my objective by creating a tibble in which I added my ID column, and by paste this new column in my original observation dataframe.

By going through several tweaks, I get my results using the following code:

      # creation of a table containing taxon name / station name / station coordinates / average density per m² on the stations

Obs_emb<- BDD %>%
  group_by(embranchement_phylum_2, station_nom, x, y) %>%
  summarise(densite_m2 = round(mean(densite_par_m2))) 

      # creation of CODE column in another tibble allowing the identification of taxa in the atlas parameters of QGIS

Tib_emb <- tibble(val = c(1:636), embranchement_phylum_2 = c((rep("Annelida",122)), (rep("Arthropoda",132)), (rep("Cnidaria",40)), (rep("Mollusca",124)), (rep("Nematoda",30)), (rep("Nemertea",77)), (rep("Platyhelminthes",78)), (rep("Porifera",1)), (rep("Xenacoelomorpha",31)), (rep("NA",1))))
Tib_emb <- mutate(Tib_emb, CODE = match(embranchement_phylum_2, unique(embranchement_phylum_2)))

      # Paste columns from Tib_emb to Obs_emb

Obs_emb <- bind_cols(Obs_emb,Tib_emb)

      # rearrange the resulting table by deleting unnecessary columns (duplicate taxa and val column)

Obs_emb <- select(Obs_emb,-c(embranchement_phylum_2...7,val))

      # Rename the taxon column for compatibility with the database

names(Obs_emb)[names(Obs_emb) == 'embranchement_phylum_2...1'] <- 'embranchement_phylum_2'

I know this may not be the most optimized way to get what i need but it works and it's fine for me.

I sincerely thank all of you for your help and wish you a good continuation!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Micha? Kami?ski
Solution 2
Solution 3 Shivam7898
Solution 4 Falcxn