'How to correct misspelling in column and collapse values into correct row in R

I'm rather new to R and struggling through data tidying. I have a problem, where I don't find an answer to, but maybe I'm searching with the wrong terms.

I have a table (df_samples) in the following format:

species gender group sample1 sample2 sample n
penguin m i. 20 21 n
penguin f i. NA 18 n
lion m ii. 5 4 n
lion f ii. 2 9 n
penguin f ii. 22 NA n
tiger m ii. 7 6 n
tiger f ii. 6 8 n

Now, the problem here is the penguin with group ii. which is wrong and should be i. In my table there are several hundred different species and samples. I have this problem with several rows, where species have the wrong group.

I was able to find the specific rows with the problems using the following code:

n_occur <- data.frame(table(df_samples$species))
df_samples_2 <- df_samples[df_samples$species %in% n_occur$Var1[n_occur$Freq > 2],]

This gives me the problematic rows and I can view them in an own dataframe. There I am able identify the rows with the mistakes and could correct them. But I have two problems where I'm stuck.

First I don't know how to index the problematic value to change it directly in my original data frame.

Second I have no idea how to bring the data stored in the row with the mistake to the "correct" row.

I am sure, there are answers on the web, but I am really struggling to express my problem in a way, which allows me to find them.

I would be grateful if somebody is able to help, either by pointing out how to search or by solving the problem.



Solution 1:[1]

Using your process you can try the following steps.

Add unique ID to the rows so that it can be filtered later.

df_samples<-df_samples %>% 
  rowid_to_column()

Remove problem rows from df_samples based on the rowid in df_samples_2

df_samples<-df_samples[-df_samples_2$rowid,]

Update df_samples_2 as per your requirements, row by row mutates based on rowid.

Merge corrected rows back to df_samples

df_samples<-bind_rows(df_samples,df_samples_2)

Also if your end goal & data is as mentioned above you could also try this on your original df_samples

df_samples <-df_samples %>% 
  group_by(species) %>%  #this will create internal groups
  arrange(species,group) %>%  # Will ensure i. will be before ii.
  mutate(group=lag(group,default=first(group))) # lag() will copy earlier row values to current row per group.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Vinay