'How to correct misspelling in column and collapse values into correct row in R
I'm rather new to R and struggling through data tidying. I have a problem, where I don't find an answer to, but maybe I'm searching with the wrong terms.
I have a table (df_samples) in the following format:
| species | gender | group | sample1 | sample2 | sample n |
|---|---|---|---|---|---|
| penguin | m | i. | 20 | 21 | n |
| penguin | f | i. | NA | 18 | n |
| lion | m | ii. | 5 | 4 | n |
| lion | f | ii. | 2 | 9 | n |
| penguin | f | ii. | 22 | NA | n |
| tiger | m | ii. | 7 | 6 | n |
| tiger | f | ii. | 6 | 8 | n |
Now, the problem here is the penguin with group ii. which is wrong and should be i. In my table there are several hundred different species and samples. I have this problem with several rows, where species have the wrong group.
I was able to find the specific rows with the problems using the following code:
n_occur <- data.frame(table(df_samples$species))
df_samples_2 <- df_samples[df_samples$species %in% n_occur$Var1[n_occur$Freq > 2],]
This gives me the problematic rows and I can view them in an own dataframe. There I am able identify the rows with the mistakes and could correct them. But I have two problems where I'm stuck.
First I don't know how to index the problematic value to change it directly in my original data frame.
Second I have no idea how to bring the data stored in the row with the mistake to the "correct" row.
I am sure, there are answers on the web, but I am really struggling to express my problem in a way, which allows me to find them.
I would be grateful if somebody is able to help, either by pointing out how to search or by solving the problem.
Solution 1:[1]
Using your process you can try the following steps.
Add unique ID to the rows so that it can be filtered later.
df_samples<-df_samples %>%
rowid_to_column()
Remove problem rows from df_samples based on the rowid in df_samples_2
df_samples<-df_samples[-df_samples_2$rowid,]
Update df_samples_2 as per your requirements, row by row mutates based on rowid.
Merge corrected rows back to df_samples
df_samples<-bind_rows(df_samples,df_samples_2)
Also if your end goal & data is as mentioned above you could also try this on your original df_samples
df_samples <-df_samples %>%
group_by(species) %>% #this will create internal groups
arrange(species,group) %>% # Will ensure i. will be before ii.
mutate(group=lag(group,default=first(group))) # lag() will copy earlier row values to current row per group.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Vinay |
