'How to wrangle the dataset in R: reshaping and creating new columns with given information
I have a dataset that looks like below,
structure(list(nonyeasted_19 = c("Force (N)", "0", "-0.0077",
"0.0023", "-0.0707", "-0.2155", "-0.2026", "-0.0628", "-0.0481",
"-0.0601", "0.0302", "0.0475", "-0.0176", "0.008", "0.0569",
"0.0242", "0.0003", "0.0295", "0.028", "-0.0221", "-0.0333",
"0.0034", "0.004", "-0.0219", "-0.0216", "-0.0261"), nonyeasted_19.1 = c("Distance (m)",
"0", "0", "0", "0", "0", "0", "0.000002", "0.000004", "0.000006",
"0.000008", "0.00001", "0.000012", "0.000014", "0.000016", "0.000018",
"0.00002", "0.000022", "0.000024", "0.000026", "0.000028", "0.00003",
"0.000032", "0.000034", "0.000036", "0.000038"), nonyeasted_19.2 = c("Time (sec)",
"0", "0.002", "0.004", "0.006", "0.008", "0.01", "0.012", "0.014",
"0.016", "0.018", "0.02", "0.022", "0.024", "0.026", "0.028",
"0.03", "0.032", "0.034", "0.036", "0.038", "0.04", "0.042",
"0.044", "0.046", "0.048"), nonyeasted_19.3 = c("Status", "101",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"), yeasted_01 = c("Force (N)",
"0", "0.0024", "0.0307", "-0.0487", "-0.2063", "-0.1928", "-0.0421",
"-0.0278", "-0.0586", "0.0251", "0.0373", "-0.0084", "0.0597",
"0.091", "0.0246", "0.0318", "", "", "", "", "", "", "", "",
""), yeasted_01.1 = c("Distance (m)", "0", "0", "0", "0", "0",
"0", "0", "0.000001", "0.000003", "0.000005", "0.000007", "0.000009",
"0.000011", "0.000013", "0.000015", "0.000017", "", "", "", "",
"", "", "", "", ""), yeasted_01.2 = c("Time (sec)", "0", "0.002",
"0.004", "0.006", "0.008", "0.01", "0.012", "0.014", "0.016",
"0.018", "0.02", "0.022", "0.024", "0.026", "0.028", "0.03",
"", "", "", "", "", "", "", "", ""), yeasted_01.3 = c("Status",
"101", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "", "", "", "", "", "", "", "", "")), class = "data.frame", row.names = c(NA,
-26L))
Every four columns are in one group, and the group names are in the first row, while the column names are in the second row. I wonder whether there are any ways to concatenate the groups vertically and create two new columns with the group name row, where column 1 contains the text before the underscore and column 2 contains the text after the underscore.
I tried to use tidyverse, but after read.csv(), the variable names could not be preserved.
Solution 1:[1]
one approach:
sample data (example_data.csv):
group_A,group_A,group_B,group_B
var_1,var_2,var_3,var_4
143,897,234,382
Code:
library(readr) ## for the read_lines function
library(tidyr) ## wrangling (pivoting etc.)
## read csv but skip first line (containing group names):
df <- read.csv('path/to/example_data.csv',skip = 1)
## read first line of csv and convert it to vector of group names:
group_names <- read_lines('path/to/example_data.csv', n_max = 1) %>%
strsplit(',') %>% unlist
## change names of dataframe df to: variable_name;group_name
names(df) <- paste(group_names, names(df), sep = ';')
## wrangle data (for documentation see https://tidyr.tidyverse.org/ )
df %>%
pivot_longer(everything(), names_to = 'group_var', values_to = 'value') %>%
separate(group_var, into = c('group', 'var'), sep = ';') %>%
separate(group, into = c('yeasted_status', 'index'), sep='_') %>%
pivot_wider(names_from = var, values_from = value)
Result:
## A tibble: 2 x 6
# yeasted_status index var_1 var_2 var3 var4
# <chr> <chr> <int> <int> <int> <int>
# 1 group A 143 897 NA NA
# 2 group B NA NA 234 382
edit
or, if df is the dataframe derived from your dput output:
df[-1,] %>%
pivot_longer(everything(),names_to = 'group_var', values_to = 'value') %>% head %>%
mutate(ID = paste(row_number(),group_var)) %>%
separate(group_var, into = c('group', 'var'), sep = ';') %>%
separate(group, into = c('yeasted_status', 'index'), sep='_') %>%
mutate(value = as.double(value)) %>%
pivot_wider(id_cols = c(ID, yeasted_status,index), names_from = var, values_from = value) %>%
select(-ID)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
