'R: Reshape a wide dataset with a large amount of related columns into a long dataset with one set of columns?

I am working with a dataset which is merged on columns so that I have many sets of fundamentally the same column (price, market cap and total volume), but for different assets. For instance, my dataset can look like this, where there are three stocks (alphabet, disney and twitter)

date	palphabet	mkalphabet	tvalphabet	pdisney	mkdisney	tvdisney	ptwitter	mktwitter	tvtwitter
2015-01-01	86.27	12030000	124308	57.93	7280000	342898	20.75	3271348	32127
2015-01-02	89.00	12568000	237823	62.47	7502990	732837	17.95	3231834	38219
2015-01-03	87.36	11837400	382183	54.88	7620300	NA	19.46	3729327	29384

What I would like to reach is the following, a dataset which is long compared to the one above, with only one set of price, market cap and total volume columns. It would like this:

date	stock	p	mk	tv
2015-01-01	alphabet	86.27	12030000	124308
2015-01-01	disney	57.93	7280000	342898
2015-01-01	twitter	20.75	3271348	32127
2015-01-02	alphabet	89.00	12568000	237823
2015-01-02	disney	62.47	7502990	732837
2015-01-02	twitter	17.95	3231834	38219
2015-01-03	alphabet	87.36	11837400	382183
2015-01-03	disney	54.88	7620300	NA
2015-01-03	twitter	19.46	3729327	29384

In reality my dataset is both much wider (many more companies) and much longer (more observations), and I have multiple datasets, so there is not a set amount of rows or columns. I have tried using the reshape command, as well as the pivot_longer command, but I cannot seem to make it work, so I would appreciate any help that I could get. Specifically, what I struggle with is to assign the name of the company to the stock variable. Since my dataset(s) includes hundreds of stocks, it is not realistic to manually list the names of each company (in for instance the "times" argument of the reshape command).

Thank you!

r reshape

Solution 1:^[1]

I'm not convinced that this is the easiest way, but you could use

library(tidyr)
library(dplyr)
library(stringr)

df %>% 
  pivot_longer(-date) %>% 
  mutate(stock = str_extract(name, "(?<=p|mk|tv).*"),
         name = str_extract(name, "^(p|mk|tv)")) %>% 
  pivot_wider()

This returns

# A tibble: 9 x 5
  date       stock        p       mk     tv
  <date>     <chr>    <dbl>    <dbl>  <dbl>
1 2015-01-01 alphabet  86.3 12030000 124308
2 2015-01-01 disney    57.9  7280000 342898
3 2015-01-01 twitter   20.8  3271348  32127
4 2015-01-02 alphabet  89   12568000 237823
5 2015-01-02 disney    62.5  7502990 732837
6 2015-01-02 twitter   18.0  3231834  38219
7 2015-01-03 alphabet  87.4 11837400 382183
8 2015-01-03 disney    54.9  7620300     NA
9 2015-01-03 twitter   19.5  3729327  29384

A little bit simpler without using stringr:

df %>% 
  pivot_longer(-date, 
               names_pattern = "(p|mk|tv)(.*)", 
               names_to = c("name", "stock")) %>% 
  pivot_wider()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Martin Gal

'R: Reshape a wide dataset with a large amount of related columns into a long dataset with one set of columns?

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]