'Add parent row as a column identifier in R
I have an export of a data which lists a hierarchical structure but it's not really usable for my purposes. Here is an example of the data:
| Level | Number |
|---|---|
| 1 | CO123 |
| 2 | PN123 |
| 2 | PN124 |
| 1 | CO124 |
| 2 | PN125 |
| 2 | PN126 |
| 1 | CO125 |
| 2 | PN127 |
| 2 | PN128 |
I want it to look like this:
| Parent | Child |
|---|---|
| CO123 | PN123 |
| CO123 | PN124 |
| CO124 | PN125 |
| CO124 | PN126 |
| CO125 | PN127 |
| CO125 | PN128 |
I have a lot more rows with varying amounts of "children" for each parent. However, the parent only shows up in the data set once, while the children could be structured to multiple parents.
I'm trying to run some analysis on each parent, like the average number of children.
Thanks in advance.
Solution 1:[1]
We may use pivot_wider to reshape to 'wide' format
library(dplyr)
library(tidyr)
library(data.table)
df1 %>%
mutate(nm1 = case_when(Level == 1 ~ "Parent", TRUE ~ "Child"),
rn = rowid(Level)) %>%
pivot_wider(names_from = nm1, values_from = Number) %>%
fill(everything(), .direction = "downup") %>%
filter(Level != 1) %>%
select(-rn,-Level)
-output
# A tibble: 6 × 2
Parent Child
<chr> <chr>
1 CO123 PN123
2 CO123 PN124
3 CO124 PN125
4 CO124 PN126
5 CO125 PN127
6 CO125 PN128
data
df1 <- structure(list(Level = c(1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L),
Number = c("CO123", "PN123", "PN124", "CO124", "PN125", "PN126",
"CO125", "PN127", "PN128")), class = "data.frame", row.names = c(NA,
-9L))
Solution 2:[2]
Another option with tidyverse using fill and filter:
library(tidyverse)
df %>%
mutate(Parent = ifelse(Level == 1, Number, NA)) %>%
fill(Parent, .direction = "down") %>%
filter(Level != 1) %>%
select(Parent, Child = Number)
Output
Parent Child
1 CO123 PN123
2 CO123 PN124
3 CO124 PN125
4 CO124 PN126
5 CO125 PN127
6 CO125 PN128
Data
df <- structure(list(Level = c(1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L),
Number = c("CO123", "PN123", "PN124", "CO124", "PN125", "PN126",
"CO125", "PN127", "PN128")), class = "data.frame", row.names = c(NA,
-9L))
Solution 3:[3]
My answer differs from the others because it is an attempt to implement a more general approach to handle multiple levels. Namely, the approach consists in finding the closest row with level less than current level
library(dplyr)
df <- structure(list(V1 = c(1, 2, 2, 1, 2, 2, 1, 2, 2), V2 = c("CO123",
"PN123", "PN124", "CO124", "PN125", "PN126", "CO125", "PN127",
"PN128")), class = "data.frame", row.names = c(NA, -9L))
attach(df)
m <- outer(V1, V1, "<")
which(m & upper.tri(m), arr.ind = TRUE) |>
data.frame() |>
group_by(col) |>
summarize(row = max(row)) |>
transmute(parent = V2[row],
child = V2[col])
##> # A tibble: 6 × 2
##> parent child
##> <chr> <chr>
##> 1 CO123 PN123
##> 2 CO123 PN124
##> 3 CO124 PN125
##> 4 CO124 PN126
##> 5 CO125 PN127
##> 6 CO125 PN128
It also works on a multi-level hierarchy e.g.:
h <- data.frame(V1 = c(1,2,3,3,2,3,3,2,2,1,2,3),
V2 = toupper(letters[1:12]))
h
##> V1 V2
##> 1 1 A
##> 2 2 B
##> 3 3 C
##> 4 3 D
##> 5 2 E
##> 6 3 F
##> 7 3 G
##> 8 2 H
##> 9 2 I
##> 10 1 J
##> 11 2 K
##> 12 3 L
attach(h)
m <- outer(V1, V1, "<")
which(m & upper.tri(m), arr.ind = TRUE) |>
data.frame() |>
group_by(col) |>
summarize(row = max(row)) |>
transmute(parent = V2[row],
child = V2[col])
##> # A tibble: 10 × 2
##> parent child
##> <chr> <chr>
##> 1 A B
##> 2 B C
##> 3 B D
##> 4 A E
##> 5 E F
##> 6 E G
##> 7 A H
##> 8 A I
##> 9 J K
##> 10 K L
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | AndrewGB |
| Solution 3 |
