'Splitting nth elements in a string in a pandas dataframe

currently I have column in a pandas dataframe. df that looks like this:

read_name
NB511043:297:HJJMHBGXJ:1:22110:22730:3876
NB511043:297:HJJMHBGXJ:4:22609:8139:4265
NB511043:298:HT6KCBGXJ:1:13311:16766:2025

What I'm hoping to do is specifically extract the 5th and 7th elements of each string in this df and append these to the end of the same dataframe, like so:

value 5th element 7th element
NB511043:297:HJJMHBGXJ:1:22110:22730:3876 22110 3876
NB511043:297:HJJMHBGXJ:4:22609:8139:4265 22609 4265
NB511043:298:HT6KCBGXJ:1:13311:16766:2025 13311 2025

my current method is to create a whole new dataframe using str.split to split everything in read_name, and then simply append these values to the new dataframe. Like so

df_read_name= df['read_name'].str.split(":", n = 6, expand = True)
df['5th element']= pd.to_numeric(df_read_name[4])
df['7th element']= pd.to_numeric(df_read_name[6])

However, I think this is a bit cumbersome and was hoping there might be a faster approach.

as always, any help is appreciated!



Solution 1:[1]

You can use .str.split with expand=True:

df[["5th element", "7th element"]] = df["read_name"].str.split(":", expand=True)[[4, 6]].astype(int)

Solution 2:[2]

You could use str.extract here:

df[["5th element", "7th element"]] = df["value"].str.extract(r'(?:[^:]+:){4}([^:]+):[^:]+:([^:]+).*')

Solution 3:[3]

If you're really always interested in the 5th and 7th element, you could use a regex with str.extract, but honestly your approach is explicit and fine, and easier to adapt:

regex = r'(?:[^:]+:){4}([^:]+):[^:]+:([^:]+)'
df[['5th element', '7th element']] = df['read_name'].str.extract(regex).astype(int)

output:

                                   read_name  5th element  7th element
0  NB511043:297:HJJMHBGXJ:1:22110:22730:3876        22110         3876
1   NB511043:297:HJJMHBGXJ:4:22609:8139:4265        22609         4265
2  NB511043:298:HT6KCBGXJ:1:13311:16766:2025        13311         2025

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Tim Biegeleisen
Solution 3 mozway