'R: Reshape a wide dataset with a large amount of related columns into a long dataset with one set of columns?
I am working with a dataset which is merged on columns so that I have many sets of fundamentally the same column (price, market cap and total volume), but for different assets. For instance, my dataset can look like this, where there are three stocks (alphabet, disney and twitter)
| date | palphabet | mkalphabet | tvalphabet | pdisney | mkdisney | tvdisney | ptwitter | mktwitter | tvtwitter |
|---|---|---|---|---|---|---|---|---|---|
| 2015-01-01 | 86.27 | 12030000 | 124308 | 57.93 | 7280000 | 342898 | 20.75 | 3271348 | 32127 |
| 2015-01-02 | 89.00 | 12568000 | 237823 | 62.47 | 7502990 | 732837 | 17.95 | 3231834 | 38219 |
| 2015-01-03 | 87.36 | 11837400 | 382183 | 54.88 | 7620300 | NA | 19.46 | 3729327 | 29384 |
What I would like to reach is the following, a dataset which is long compared to the one above, with only one set of price, market cap and total volume columns. It would like this:
| date | stock | p | mk | tv |
|---|---|---|---|---|
| 2015-01-01 | alphabet | 86.27 | 12030000 | 124308 |
| 2015-01-01 | disney | 57.93 | 7280000 | 342898 |
| 2015-01-01 | 20.75 | 3271348 | 32127 | |
| 2015-01-02 | alphabet | 89.00 | 12568000 | 237823 |
| 2015-01-02 | disney | 62.47 | 7502990 | 732837 |
| 2015-01-02 | 17.95 | 3231834 | 38219 | |
| 2015-01-03 | alphabet | 87.36 | 11837400 | 382183 |
| 2015-01-03 | disney | 54.88 | 7620300 | NA |
| 2015-01-03 | 19.46 | 3729327 | 29384 |
In reality my dataset is both much wider (many more companies) and much longer (more observations), and I have multiple datasets, so there is not a set amount of rows or columns. I have tried using the reshape command, as well as the pivot_longer command, but I cannot seem to make it work, so I would appreciate any help that I could get. Specifically, what I struggle with is to assign the name of the company to the stock variable. Since my dataset(s) includes hundreds of stocks, it is not realistic to manually list the names of each company (in for instance the "times" argument of the reshape command).
Thank you!
Solution 1:[1]
I'm not convinced that this is the easiest way, but you could use
library(tidyr)
library(dplyr)
library(stringr)
df %>%
pivot_longer(-date) %>%
mutate(stock = str_extract(name, "(?<=p|mk|tv).*"),
name = str_extract(name, "^(p|mk|tv)")) %>%
pivot_wider()
This returns
# A tibble: 9 x 5
date stock p mk tv
<date> <chr> <dbl> <dbl> <dbl>
1 2015-01-01 alphabet 86.3 12030000 124308
2 2015-01-01 disney 57.9 7280000 342898
3 2015-01-01 twitter 20.8 3271348 32127
4 2015-01-02 alphabet 89 12568000 237823
5 2015-01-02 disney 62.5 7502990 732837
6 2015-01-02 twitter 18.0 3231834 38219
7 2015-01-03 alphabet 87.4 11837400 382183
8 2015-01-03 disney 54.9 7620300 NA
9 2015-01-03 twitter 19.5 3729327 29384
A little bit simpler without using stringr:
df %>%
pivot_longer(-date,
names_pattern = "(p|mk|tv)(.*)",
names_to = c("name", "stock")) %>%
pivot_wider()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Martin Gal |
