'R: Reshape a wide dataset with a large amount of related columns into a long dataset with one set of columns?

I am working with a dataset which is merged on columns so that I have many sets of fundamentally the same column (price, market cap and total volume), but for different assets. For instance, my dataset can look like this, where there are three stocks (alphabet, disney and twitter)

date palphabet mkalphabet tvalphabet pdisney mkdisney tvdisney ptwitter mktwitter tvtwitter
2015-01-01 86.27 12030000 124308 57.93 7280000 342898 20.75 3271348 32127
2015-01-02 89.00 12568000 237823 62.47 7502990 732837 17.95 3231834 38219
2015-01-03 87.36 11837400 382183 54.88 7620300 NA 19.46 3729327 29384

What I would like to reach is the following, a dataset which is long compared to the one above, with only one set of price, market cap and total volume columns. It would like this:

date stock p mk tv
2015-01-01 alphabet 86.27 12030000 124308
2015-01-01 disney 57.93 7280000 342898
2015-01-01 twitter 20.75 3271348 32127
2015-01-02 alphabet 89.00 12568000 237823
2015-01-02 disney 62.47 7502990 732837
2015-01-02 twitter 17.95 3231834 38219
2015-01-03 alphabet 87.36 11837400 382183
2015-01-03 disney 54.88 7620300 NA
2015-01-03 twitter 19.46 3729327 29384

In reality my dataset is both much wider (many more companies) and much longer (more observations), and I have multiple datasets, so there is not a set amount of rows or columns. I have tried using the reshape command, as well as the pivot_longer command, but I cannot seem to make it work, so I would appreciate any help that I could get. Specifically, what I struggle with is to assign the name of the company to the stock variable. Since my dataset(s) includes hundreds of stocks, it is not realistic to manually list the names of each company (in for instance the "times" argument of the reshape command).

Thank you!



Solution 1:[1]

I'm not convinced that this is the easiest way, but you could use

library(tidyr)
library(dplyr)
library(stringr)

df %>% 
  pivot_longer(-date) %>% 
  mutate(stock = str_extract(name, "(?<=p|mk|tv).*"),
         name = str_extract(name, "^(p|mk|tv)")) %>% 
  pivot_wider()

This returns

# A tibble: 9 x 5
  date       stock        p       mk     tv
  <date>     <chr>    <dbl>    <dbl>  <dbl>
1 2015-01-01 alphabet  86.3 12030000 124308
2 2015-01-01 disney    57.9  7280000 342898
3 2015-01-01 twitter   20.8  3271348  32127
4 2015-01-02 alphabet  89   12568000 237823
5 2015-01-02 disney    62.5  7502990 732837
6 2015-01-02 twitter   18.0  3231834  38219
7 2015-01-03 alphabet  87.4 11837400 382183
8 2015-01-03 disney    54.9  7620300     NA
9 2015-01-03 twitter   19.5  3729327  29384

A little bit simpler without using stringr:

df %>% 
  pivot_longer(-date, 
               names_pattern = "(p|mk|tv)(.*)", 
               names_to = c("name", "stock")) %>% 
  pivot_wider()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Martin Gal