'Transform long-to-wide database spreading data on multiple columns in R
I am struggling in transforming a long dataframe to wide version, but with some complications.
I have a column, ID, with duplicated entry - each refers to a timepoint for an individual. I have also other columns (namely, visit, var1 and var2), in which data regarding each timepoint are reported. Here is a reproducible example:
df <- data.frame(ID=c(1,1,1,1,2,2,2,3,3,3),
visit=c(1,4,5,7,1,3,4,2,5,6),
var1=c("AF","no","no","no","AG","AG","no","BA","BA","BA"),
var2=c("good","good","good","bad","good","good","bad","good","good","good"))
And the output:
ID visit var1 var2
1 1 1 AF good
2 1 4 no good
3 1 5 no good
4 1 7 no bad
5 2 1 AG good
6 2 3 AG good
7 2 4 no bad
8 3 2 BA good
9 3 5 BA good
10 3 6 BA good
I really need to end up with a dataframe which contains only one row per ID, and multiple columns for each of the other variables, with a numeric suffix for example (such as visit_1, visit_2, visit_3 etc.).
The output I have in mind is like this:
ID visit_1 visit_2 visit_3 visit_4 var1_1 var1_2 var1_3 var1_4 var2_1 var2_2 var2_3 var2_4
1 1 1 4 5 7 AF no no no good good good bad
2 2 1 3 4 NA AG AG no <NA> good good bad <NA>
3 3 2 5 6 NA BA BA BA <NA> good good good <NA>.
in which essentially each entry of the columns visit, var1 and var2 has been placed in a separate, sequential column based on the ID column.
I have tried data.frame::dcast and tidyr::spread, and also pivot_wider() but it seems like these formula will end up with multiple column based on the actual values rather than producing fixed columns instead. For example, with pivot_wider():
df %>% pivot_wider(names_from = ID, values_from = c("visit","var1","var2"))
It returns me an error saying
Warning messages:
1: Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates
2: Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates
3: Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates
Anyone can help?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
