'How to separate a DataFrame column in two given a delimiter?
Given a DataFrame df in Julia:
using DataFrames
df = DataFrame(X = ['A', 'B', 'C'], Y = ["a|b", "a|c", "b|b"])
How can I create columns Y1 and Y2 by splitting column Y at the "|" delimiter?
E.g., in R tidyverse I'd do:
separate(df, Y, c("Y1", "Y2"), by = "\\|")
Solution 1:[1]
A'm new in Julia, but I tried this solution and worked.
df[:,:Y1]=[i[1] for i in [split(i,"|") for i in df[:,2]]]
df[:,:Y2]=[i[2] for i in [split(i,"|") for i in df[:,2]]]
It's ok or is too bad?
Solution 2:[2]
I learned a bit from this julia discourse post, there is a nicer way. This is idiomatic in the DataFrames world:
transform(df, :Y => ByRow(x -> split(x, "|")) => [:Y1, :Y2])
#3×4 DataFrame
# Row ? X Y Y1 Y2
# ? Char String SubStrin… SubStrin…
#??????????????????????????????????????????
# 1 ? A a|b a b
# 2 ? B a|c a c
# 3 ? C b|b b b
# its even nicer if the delimiter is whitespace:
transform(df, :Y => ByRow(split) => AsTable)
I haven't tested where the number of fields from the split are uneven, I suspect the way to go in that case is flatten and then unstack.
I'll add a shout to the DataFramesMeta way of formulating syntax which is so nice IMO:
using DataFramesMeta
@rtransform df $[:Y1, :Y2] = split(:Y, "|")
# 3×4 DataFrame
# Row ? X Y Y1 Y2
# ? Char String SubStrin… SubStrin…
#??????????????????????????????????????????
# 1 ? A a|b a b
# 2 ? B a|c a c
# 3 ? C b|b b b
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Gustavo Rossini |
| Solution 2 |
