'Data cleaning & subsetting in nested list
I couldn't find any previous questions which addresses these steps in a nested list. My own attems hasn't got me anywhere either!
I have a nested list df.
- I would like to change the column names of the 3 first columns in
all data.frames to
c("one","two","three"). - In each data frame want to keep the 3 first columns and the columns with the same name as the data frame name in the list.
- Now each data frame has 4 columns. In each data frame I want to keep the values in the second columns if the values of the fourth column is bigger than 3.
- Return a nested list, with the name of each data frame and selected values from the second column (in step 4).
Purrr and dplyr approach are preferred but everything else is much appreciated!
> dput(map_depth(df,1, head))
list(`CD8_C01-LEF1` = structure(list(...1 = c("1236", "6194",
"51176", "6402", "6137", "1937"), ...2 = c("CCR7", "RPS6", "LEF1",
"SELL", "RPL13", "EEF1G"), ...3 = c(448.275813024615, 114.565282822255,
405.993571415472, 352.462886197845, 152.430598462657, 73.5226212775651
), `P-value*` = c(0, 2.35914832807463e-150, 0, 0, 1.03146807397557e-195,
3.00681346250943e-98), `CD8_C01-LEF1` = c(6.3388353508401, 1.36075129906401,
5.11667843995657, 5.22902495053118, 1.35703181746742, 1.72815687302818
), `CD8_C02-GPR183` = c(2.71993044636725, 0.755445092850178,
2.26029822474036, 3.57732840656951, 0.757664532314421, 0.732003573596204
), `CD8_C03-CX3CR1` = c(-2.50016459757821, 0.0430813598361915,
-1.47763877045973, -1.31104077043168, -0.118054173396857, -0.217984797372657
), `CD8_C04-GZMK` = c(-0.639352384551204, -0.304854019068466,
-1.400271288872, -1.56965980479594, -0.128422617265835, -0.701864111617954
), `CD8_C05-CD6` = c(-2.35873754058284, -0.115888861319928, -2.08628173736428,
-3.32630706764402, -0.177640817498698, -0.215754243123614), `CD8_C06-CD160` = c(-2.85558322130952,
-0.29530343951866, -2.20232116143474, -3.274807762691, -0.440783845861116,
-0.56207661416919), `CD8_C07-LAYN` = c(-2.75671138163062, -0.887003245107014,
-2.40845402752497, -3.47698326675668, -1.03656381624963, -1.46468960616135
), `CD8_C08-SLC4A10` = c(-2.68199272253543, 0.0292368512820967,
-2.1581654239029, -2.99895134853712, 0.0615744908900675, 0.192173783941343
)), row.names = c(NA, 6L), class = "data.frame"), `CD8_C02-GPR183` = structure(list(
...1 = c("3575", "4050", "1901", "6653", "1880", "10628"),
...2 = c("IL7R", "LTB", "S1PR1", "SORL1", "GPR183", "TXNIP"
), ...3 = c(268.347035159053, 151.397715576146, 423.815475272167,
154.131971403975, 161.502687932662, 138.188069200824), `P-value*` = c(0,
1.63481853000449e-194, 0, 1.09616441981898e-197, 3.47999420200636e-206,
5.87606326954945e-179), `CD8_C01-LEF1` = c(2.25872137515665,
1.06433926285014, 2.06890434595653, 1.77222927526522, -2.32256398023726,
1.17445992511194), `CD8_C02-GPR183` = c(3.58534594694992,
2.33774626980998, 3.1044712936119, 3.00075778716827, 1.54874669286004,
2.11053414857411), `CD8_C03-CX3CR1` = c(-2.73122665345433,
-3.23251051546321, 2.76359001828421, 0.899851788567591, -3.4595583469893,
1.9924219816788), `CD8_C04-GZMK` = c(-1.20359289904198, -2.27859013855459,
-0.289843306560729, 0.0930099548084882, 0.293766916539111,
-1.05998934689132), `CD8_C05-CD6` = c(0.771026257612103,
-1.84446654315228, -1.92859019625536, -0.993527571866541,
-0.517242518264243, -1.05505195656161), `CD8_C06-CD160` = c(-1.26433565787961,
-3.62072638085859, -1.99838091859197, -2.66224984657089,
-3.84677781455005, -0.741084525734145), `CD8_C07-LAYN` = c(-4.85420539962432,
-3.79535857695107, -2.07599716553024, -2.41001692585172,
-3.66993376805675, -1.90910214659534), `CD8_C08-SLC4A10` = c(1.79563839118781,
0.431971358693421, 0.24665792844753, 0.820564247625701, -0.941462395796914,
0.224912511574641)), row.names = c(NA, 6L), class = "data.frame"),
`CD8_C03-CX3CR1` = structure(list(...1 = c("5341", "1524",
"83888", "2214", "343413", "10219"), ...2 = c("PLEK", "CX3CR1",
"FGFBP2", "FCGR3A", "FCRL6", "KLRG1"), ...3 = c(372.816216710618,
713.554708746553, 575.834099328186, 419.996034284325, 215.715234731706,
281.827177706662), `P-value*` = c("0", "0", "0", "0", "3.5450627744914998E-266",
"0"), `CD8_C01-LEF1` = c(-1.34745098111019, -0.39476162886016,
-0.248194028712413, -0.326944139043036, -0.833877751680806,
-0.822668603983214), `CD8_C02-GPR183` = c(0.50737446056126,
-0.495638146054913, -0.484905896571723, -0.125753818325312,
0.0263098770399738, 0.894340812937189), `CD8_C03-CX3CR1` = c(6.36825282208761,
5.38301238794739, 5.26196506464758, 5.6197563760267, 5.8532850807879,
5.36851683724817), `CD8_C04-GZMK` = c(1.44463895049283, -0.513803138075432,
-0.125340966094923, 0.2447981258131, 1.34537977512099, 2.10784813093189
), `CD8_C05-CD6` = c(-0.718776566594413, -0.795121492384525,
-0.681892196238474, -0.421395883952147, 0.0987360993173341,
-1.35585804120358), `CD8_C06-CD160` = c(-0.550964233191398,
-0.794078725052049, -0.707741972359531, -0.156207202527366,
2.24842830259497, -1.28977809817504), `CD8_C07-LAYN` = c(0.0641870785667258,
-0.785201010640904, -0.631939964779986, -0.340799120353511,
0.271892089522186, 0.236064375692484), `CD8_C08-SLC4A10` = c(1.40102283829925,
-0.158585496249154, -0.056110756095033, 0.00915832466806331,
-0.085141865592199, 3.78847417230501)), row.names = c(NA,
6L), class = "data.frame"))
Solution 1:[1]
A solution would be:
res <- lapply(setNames(nm = names(df)), function(dfname) {
dff <- df[[dfname]]
# only renaming column 2 as columns 1 and 3 are not used later on
colnames(dff)[2] <- "two"
# not 'keeping' the column with the same name as the dataframe, just using the dataframe straightaway
dff$two[dff[,dfname] > 3]
})
Note the setNames(...) statement as first argument to lapply. If you send a named list to lapply, it uses the names of the elements as the names of the elements it returns.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | marl1 |
