'How do I subset a dataframe based on several other dataframes using R?

I have three lists: r, e, and p. The Class column in r corresponds with the rownames in e and p. If >50% in BOTH groups (i.e., Class 0 and 1 in r) have a value >.05 in p list, remove the corresponding row in list e.

r: class membership e: exon p: p-value

> head(r)
  Class       Results
1     1 JHU_113_2.CEL
2     0   JHU_144.CEL
3     1   JHU_173.CEL
4     1  JHU_176R.CEL
5     1   JHU_182.CEL
6     0   JHU_186.CEL


> head(e)
        JHU_113_2.CEL JHU_144.CEL JHU_173.CEL JHU_176R.CEL JHU_182.CEL
2315252       4.21222     4.24054     3.55855      4.57541     4.50411
2315253       1.46773     1.68980     1.54697      1.75198     1.35377
2315374       6.28274     6.79161     6.11265      6.13997     6.68056
2315375       4.27911     3.53146     3.83499      3.71238     3.38309
2315376       5.81678     5.71165     6.02794      5.37082     5.95527
2315377       5.02186     4.82032     4.44263      3.43122     4.02596
        JHU_186.CEL JHU_187.CEL JHU_188.CEL JHU_203.CEL JHU_205.CEL JHU_207.CEL
2315252     4.43086     4.04965     3.38021     2.22649     4.08213     4.18479
2315253     1.86128     1.68910     2.20902     1.84491     1.39976     1.95915
2315374     6.48156     6.45415     6.04542     5.99176     6.06579     5.85832
2315375     3.99563     3.58458     3.08520     3.44144     3.97563     3.65498
2315376     5.75999     5.87863     5.54830     6.35571     5.88177     5.92593
2315377     5.32014

> head(p)
        JHU_113_2.CEL JHU_144.CEL JHU_173.CEL JHU_176R.CEL JHU_182.CEL
2315252       0.09655     0.04224     0.22314      0.03202     0.03889
2315253       0.64864     0.38068     0.49589      0.38359     0.93560
2315374       0.00730     0.00293     0.03034      0.02571     0.00436
2315375       0.11744     0.29977     0.21102      0.21728     0.33313
2315376       0.04079     0.01525     0.03090      0.08493     0.01303
2315377       0.01269     0.02355     0.09147      0.31425     0.13685
        JHU_186.CEL JHU_187.CEL JHU_188.CEL JHU_203.CEL JHU_205.CEL JHU_207.CEL
2315252     0.03532     0.06716     0.35315     0.69236     0.21461     0.36181
2315253     0.28369     0.39982     0.20404     0.25617     0.97292     0.48171
2315374     0.00788     0.00520     0.01704     0.03273     0.06720     0.06545
2315375     0.10076     0.32012     0.45451     0.34734     0.01755     0.32847
2315376     0.03559     0.02163     0.01586     0.04264     0.05689     0.08093
2315377     0.01356

r bioinformatics

Solution 1:^[1]

I haven't tested but the code below should work

library(dplyr)
library(tidyr)

eflt <- local({
    rows <- data.frame(p, check.names = FALSE) |>
        tibble::rownames_to_column("RowID") |>
        pivot_longer(names_to = "Results",
                     values_to = "P") |>
        left_join(data.frame(r), by = c("Results")) |>
        group_by(Class, RowID) |>
        summarize(cnt = mean(P > 0.05)) |>
        ungroup()  |>
        pivot_wider(names_from = Class,
                    names_prefix = "C",
                    values_from = cnt) |>
        filter(C0 > 0.5, C1 > 0.5) |>
        pull(RowID)
    
    e[rows,]
})

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'How do I subset a dataframe based on several other dataframes using R?

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]