'Clarification on Reshaping DataFrame from LONG to WIDE in R with Gather/Spread
I am sorry to come back on a topic that has several threads on Stack, but I am trying to Reshape a dataset from LONG to WIDE, using Tidyverse, and either the Gather/Spread function as well as the pivot_wider function, and I am lost. Here it is a sample of the subset I am using for test
structure(list(pid = structure(c(1L, 1L, 1L, 1L, 1L, 2L), .Label = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
"25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35",
"36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46",
"47", "48", "49", "50", "51", "52", "53", "54", "55", "56", "57",
"58", "59", "60", "61", "62", "63", "64", "65", "66", "67", "68",
"69", "70", "71", "72", "73", "74", "75", "76", "77", "78", "79",
"80", "81", "82", "83", "84", "85", "86", "87", "88", "89", "90",
"91", "92", "93", "94", "95", "96", "97", "98", "99", "100",
"101", "102", "103", "104", "105", "106", "107", "108", "109",
"110", "111", "112", "113", "114", "115", "116", "117", "118",
"119", "120", "121", "122", "123", "124", "125", "126", "127",
"128", "129", "130", "131", "132", "133", "134", "135", "136",
"137", "138", "139", "140", "141", "142", "143", "144", "145",
"146", "147", "148", "149", "150", "151", "152", "153", "154",
"155", "156", "157", "158", "159", "160", "161", "162", "163",
"164", "165", "166", "167", "168", "169", "170", "171", "172",
"173", "174", "175", "176", "177", "178", "179", "180", "181",
"182", "183", "184", "185", "186", "187", "188", "189", "190",
"191", "192", "193", "194", "195", "196", "197", "198", "199",
"200", "201", "202", "203", "204", "205", "206", "207", "208",
"209", "210", "211", "212", "213", "214", "215", "216"), class = "factor"),
timewave = structure(c(1L, 2L, 3L, 4L, 5L, 1L), .Label = c("1",
"2", "3", "4", "5", "6", "7", "8"), class = "factor"), dev_icd = structure(c(1L,
1L, 1L, 1L, 1L, 2L), .Label = c("No", "Yes"), class = "factor"),
lab_bnp = c(388, 199, 387.5, 318, 154, 949.4)), row.names = c(NA,
6L), class = "data.frame")
And here there the 2 commands I came up to
test.wide2 <- test.long2 %>%
pivot_wider(id_cols = pid,
names_from = timewave,
values_from = c(dev_icd, lab_bnp),
names_sep = "")
Or also
test.wide <- test.long2 %>%
group_by(pid) %>%
gather("dev_icd", "lab_bnp",
key = variable, value = number ) %>%
unite(combi, variable, timewave) %>%
spread(combi, number)
Neither is working as I would expect, I get a lot of NA or NULL values and do not understand what is my mistake and the correct procedure. Any help not only in fixing the issue but mainly in understanding the Reshaping logic/philosophy would be greatly appreciated
Solution 1:[1]
Thanks to help from Merjin van Tiborg, I have finally nailed down the issue. The correct command to have PID on rows and columns for dev-icd and lab_bnp repeated by timewave number is as follows:
test.wide <- hf.longsmall %>%
pivot_wider(id_cols = c("pid", "timewave"),
names_from = timewave,
values_from = c(dev_icd, lab_bnp),
names_sep = "_t")
That is equivalent to the following
test.wide1 <- hf.longsmall %>%
group_by(pid, timewave) %>%
mutate(row = row_number()) %>%
tidyr::pivot_wider(names_from = timewave,
values_from = c(dev_icd, lab_bnp),
names_sep = "_t") %>%
select(-row)
I have got the following warning --> "Values in values_from are not uniquely identified; output will contain list-cols", and that was due to a real (and dangerous) mistake of duplicated PID during data entry. Anyhow, I could understand the issue only using the group_by option reported above.
Thank you all for the patience
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Diego |
