'How to separate a DataFrame column in two given a delimiter?

Given a DataFrame df in Julia:

using DataFrames
df = DataFrame(X = ['A', 'B', 'C'], Y = ["a|b", "a|c", "b|b"])

How can I create columns Y1 and Y2 by splitting column Y at the "|" delimiter?

E.g., in R tidyverse I'd do:

separate(df, Y, c("Y1", "Y2"), by = "\\|")


Solution 1:[1]

A'm new in Julia, but I tried this solution and worked.

df[:,:Y1]=[i[1] for i in [split(i,"|") for i in df[:,2]]]
df[:,:Y2]=[i[2] for i in [split(i,"|") for i in df[:,2]]]

It's ok or is too bad?

Solution 2:[2]

I learned a bit from this julia discourse post, there is a nicer way. This is idiomatic in the DataFrames world:

transform(df, :Y => ByRow(x -> split(x, "|")) => [:Y1, :Y2])

#3×4 DataFrame
# Row ? X     Y       Y1         Y2
#     ? Char  String  SubStrin…  SubStrin…
#??????????????????????????????????????????
#   1 ? A     a|b     a          b
#   2 ? B     a|c     a          c
#   3 ? C     b|b     b          b

# its even nicer if the delimiter is whitespace:
transform(df, :Y => ByRow(split) => AsTable)

I haven't tested where the number of fields from the split are uneven, I suspect the way to go in that case is flatten and then unstack.

I'll add a shout to the DataFramesMeta way of formulating syntax which is so nice IMO:

using DataFramesMeta

@rtransform df $[:Y1, :Y2] = split(:Y, "|")

# 3×4 DataFrame
# Row ? X     Y       Y1         Y2
#     ? Char  String  SubStrin…  SubStrin…
#??????????????????????????????????????????
#   1 ? A     a|b     a          b
#   2 ? B     a|c     a          c
#   3 ? C     b|b     b          b

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Gustavo Rossini
Solution 2