'How to filter a dataframe by rows without losing the index (or row number)?

I have a small dataframe (dt) containing binary labels from separate catboost runs:

structure(list(old.cat.lab = c(1, 1, 0, 0, 0, 1, 0, 0, 0, 1), 
new.cat.lab = c(1, 1, 0, 0, 1, 1, 0, 1, 0, 1)), row.names = c(NA, 10L), class = "data.frame")

I want to filter the rows where dt$new.cat.lab == 1 using (from the dplyr package):

dt.match <- dt %>% filter(dt$new.cat.lab ==1, .preserve = T)

The problem is that the filter function assigns a new row number. I would like to preserve the row numbers (index) in the new variable. The .preserve=T command within dplyr's filter function doesn't seem to do that.



Solution 1:[1]

The tidyverse, doesn't preserve the row names, we can create a new column of row names and then apply the filter

library(dplyr)
library(tibble)
dt %>%
   rownames_to_column('rn') %>%
   filter(new.cat.lab ==1)%>%
   column_to_rownames('rn')
#   old.cat.lab new.cat.lab
#1            1           1
#2            1           1
#5            0           1
#6            1           1
#8            0           1
#10           1           1

According to ?dplyr::filter, the .preserve is for grouping structure

.preserve - Relevant when the .data input is grouped. If .preserve = FALSE (the default), the grouping structure is recalculated based on the resulting data, otherwise the grouping is kept as is.


In base R, this can be done with subset

subset(dt, new.cat.lab == 1)

Or use as.logical

subset(dt, as.logical(new.cat.lab))

Solution 2:[2]

library(dplyr)

iris %>% 
  mutate(index = rownames(.)) %>% 
  relocate(index, .before = "Sepal.Length") %>% 
  subset(., Sepal.Length == max(Sepal.Length))

Created an index column using the rownames/rownumbers for tracking purpose. then filter. Below is another variant

iris %>% 
  mutate(index = rownames(.)) %>% 
  relocate(index, .before = "Sepal.Length") %>% 
  filter(Sepal.Length == max(Sepal.Length))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Martin Gal