'Subset a dataframe based on column value in r [closed]
I have what I consider to be a big dataframe : 1 178 366 rows over 36 columns. And I would like to subset my dataframe by selecting all the rows containing a specific column value.
Let's say my dataframe (df) looks something like this:
OTU Sample_site Abundance Family Genus
otu1 Water 124 Comamonadaceae Rhodoferax
otu1 Soil 85 Comamonadaceae Rhodoferax
otu2 Water 0 Spirosomaceae Pseudarcicella
otu2 Soil 248 Spirosomaceae Pseudarcicella
otu3 Water 47 Comamonadaceae Leptothrix
.
.
.
I would like to select rows for which the value of the column Family is Comamonadaceae and the new dataframe (df2) to look like this:
OTU Sample_site Abundance Family Genus
otu1 Water 124 Comamonadaceae Rhodoferax
otu1 Soil 85 Comamonadaceae Rhodoferax
otu3 Water 47 Comamonadaceae Leptothrix
.
.
.
I tried 2 options:
df2 <- df %>% dplyr::filter(Family == "Comamonadaceae")
df2 <- df[df$Family=="Comamonadaceae",]
But neither worked and they give me empty dataframes with only the column names. So in our example:
OTU Sample_site Abundance Family Genus
I don't even know where the error is coming from. I checked multiple times for typos but it doesn't seem to be it. Could it be the size of the dataframe? The Family column being characters?
I checked quite a few similar questions but didn't find anything that matched my problem.
Any help would be appreciated,
Sophie
Solution 1:[1]
Thank you Dan Adams, after rechecking I noticed there was a space before each taxonomic name.
So I just added a space in my code like so:
df2 <- df %>% dplyr::filter(Family == " Comamonadaceae")
Now it works just fine and I have the dataframe of my dreams. :)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | sophiethomas |
