'How to wrangle the dataset in R: reshaping and creating new columns with given information

I have a dataset that looks like below,

structure(list(nonyeasted_19 = c("Force (N)", "0", "-0.0077", 
"0.0023", "-0.0707", "-0.2155", "-0.2026", "-0.0628", "-0.0481", 
"-0.0601", "0.0302", "0.0475", "-0.0176", "0.008", "0.0569", 
"0.0242", "0.0003", "0.0295", "0.028", "-0.0221", "-0.0333", 
"0.0034", "0.004", "-0.0219", "-0.0216", "-0.0261"), nonyeasted_19.1 = c("Distance (m)", 
"0", "0", "0", "0", "0", "0", "0.000002", "0.000004", "0.000006", 
"0.000008", "0.00001", "0.000012", "0.000014", "0.000016", "0.000018", 
"0.00002", "0.000022", "0.000024", "0.000026", "0.000028", "0.00003", 
"0.000032", "0.000034", "0.000036", "0.000038"), nonyeasted_19.2 = c("Time (sec)", 
"0", "0.002", "0.004", "0.006", "0.008", "0.01", "0.012", "0.014", 
"0.016", "0.018", "0.02", "0.022", "0.024", "0.026", "0.028", 
"0.03", "0.032", "0.034", "0.036", "0.038", "0.04", "0.042", 
"0.044", "0.046", "0.048"), nonyeasted_19.3 = c("Status", "101", 
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", 
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"), yeasted_01 = c("Force (N)", 
"0", "0.0024", "0.0307", "-0.0487", "-0.2063", "-0.1928", "-0.0421", 
"-0.0278", "-0.0586", "0.0251", "0.0373", "-0.0084", "0.0597", 
"0.091", "0.0246", "0.0318", "", "", "", "", "", "", "", "", 
""), yeasted_01.1 = c("Distance (m)", "0", "0", "0", "0", "0", 
"0", "0", "0.000001", "0.000003", "0.000005", "0.000007", "0.000009", 
"0.000011", "0.000013", "0.000015", "0.000017", "", "", "", "", 
"", "", "", "", ""), yeasted_01.2 = c("Time (sec)", "0", "0.002", 
"0.004", "0.006", "0.008", "0.01", "0.012", "0.014", "0.016", 
"0.018", "0.02", "0.022", "0.024", "0.026", "0.028", "0.03", 
"", "", "", "", "", "", "", "", ""), yeasted_01.3 = c("Status", 
"101", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", 
"1", "1", "1", "1", "", "", "", "", "", "", "", "", "")), class = "data.frame", row.names = c(NA, 
-26L))

Every four columns are in one group, and the group names are in the first row, while the column names are in the second row. I wonder whether there are any ways to concatenate the groups vertically and create two new columns with the group name row, where column 1 contains the text before the underscore and column 2 contains the text after the underscore.

I tried to use tidyverse, but after read.csv(), the variable names could not be preserved.



Solution 1:[1]

one approach:

sample data (example_data.csv):

group_A,group_A,group_B,group_B
var_1,var_2,var_3,var_4
143,897,234,382

Code:

library(readr) ## for the read_lines function
library(tidyr) ## wrangling (pivoting etc.)

## read csv but skip first line (containing group names):
df <- read.csv('path/to/example_data.csv',skip = 1)
## read first line of csv and convert it to vector of group names:
group_names <- read_lines('path/to/example_data.csv', n_max = 1) %>%
    strsplit(',') %>% unlist
## change names of dataframe df to: variable_name;group_name
names(df) <- paste(group_names, names(df), sep = ';')


## wrangle data (for documentation see https://tidyr.tidyverse.org/ )
df %>%
    pivot_longer(everything(), names_to = 'group_var', values_to = 'value') %>%
    separate(group_var, into = c('group', 'var'), sep = ';') %>%
    separate(group, into = c('yeasted_status', 'index'), sep='_') %>%
    pivot_wider(names_from = var, values_from = value)

Result:

## A tibble: 2 x 6
#  yeasted_status index var_1 var_2  var3  var4
#  <chr>          <chr> <int> <int> <int> <int>
# 1 group          A       143   897    NA    NA
# 2 group          B        NA    NA   234   382

edit or, if df is the dataframe derived from your dput output:

df[-1,] %>% 
    pivot_longer(everything(),names_to = 'group_var', values_to = 'value') %>% head %>%
    mutate(ID = paste(row_number(),group_var)) %>% 
    separate(group_var, into = c('group', 'var'), sep = ';') %>% 
    separate(group, into = c('yeasted_status', 'index'), sep='_') %>% 
    mutate(value = as.double(value)) %>% 
    pivot_wider(id_cols = c(ID, yeasted_status,index), names_from = var, values_from = value) %>%
    select(-ID)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1