'Subset a dataframe based on column value in r [closed]

I have what I consider to be a big dataframe : 1 178 366 rows over 36 columns. And I would like to subset my dataframe by selecting all the rows containing a specific column value.

Let's say my dataframe (df) looks something like this:

OTU   Sample_site  Abundance  Family          Genus

otu1  Water        124        Comamonadaceae  Rhodoferax
otu1  Soil         85         Comamonadaceae  Rhodoferax
otu2  Water        0          Spirosomaceae   Pseudarcicella
otu2  Soil         248        Spirosomaceae   Pseudarcicella
otu3  Water        47         Comamonadaceae  Leptothrix
.
.
.

I would like to select rows for which the value of the column Family is Comamonadaceae and the new dataframe (df2) to look like this:

OTU   Sample_site  Abundance  Family          Genus

otu1  Water        124        Comamonadaceae  Rhodoferax
otu1  Soil         85         Comamonadaceae  Rhodoferax
otu3  Water        47         Comamonadaceae  Leptothrix
.
.
.

I tried 2 options:

df2 <- df %>% dplyr::filter(Family == "Comamonadaceae")
df2 <- df[df$Family=="Comamonadaceae",]

But neither worked and they give me empty dataframes with only the column names. So in our example:

OTU   Sample_site  Abundance  Family          Genus

I don't even know where the error is coming from. I checked multiple times for typos but it doesn't seem to be it. Could it be the size of the dataframe? The Family column being characters?

I checked quite a few similar questions but didn't find anything that matched my problem.

Any help would be appreciated,

Sophie

r dataframe subset

Solution 1:^[1]

Thank you Dan Adams, after rechecking I noticed there was a space before each taxonomic name.

So I just added a space in my code like so:

df2 <- df %>% dplyr::filter(Family == " Comamonadaceae")

Now it works just fine and I have the dataframe of my dreams. :)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	sophiethomas

'Subset a dataframe based on column value in r [closed]

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]