'How can I loop through a string from the end to split it to several strings

2014_FIFA_World_Cup_en.wikipedia.org_all-access_all-agents 2015_Copa_América_en.wikipedia.org_all-access_all-agents 2016_Summer_Olympics_en.wikipedia.org_all-access_all-agents 2018_FIFA_World_Cup_en.wikipedia.org_all-access_all-agents 2014_FIFA_World_Cup_en.wikipedia.org_mobile-web_all-agents A_Song_of_Ice_and_Fire_en.wikipedia.org_desktop_all-agents

I have a column in my dataset with the above as some of the observations from that column. I'm trying to separate the column into 4 columns such that I have one column as 2014_FIFA_World_Cup, 2015_Copa_América, 2016_Summer_Olympics; another column as en.wikipedia.org, en.wikipedia.org, en.wikipedia.org; another as all-access, mobile-web, desktop.

I've tried the following

long_string = """2014_FIFA_World_Cup_en.wikipedia.org_all-access_all-agents 2015_Copa_América_en.wikipedia.org_all-access_all-agents 2016_Summer_Olympics_en.wikipedia.org_all-access_all-agents 2018_FIFA_World_Cup_en.wikipedia.org_all-access_all-agents 2014_FIFA_World_Cup_en.wikipedia.org_mobile-web_all-agents A_Song_of_Ice_and_Fire_en.wikipedia.org_desktop_all-agents"""

lines = long_string.split("\n")

columns = [line.split("_") for line in lines]

print(columns)

Got the following result:

[['2014', 'FIFA', 'World', 'Cup', 'en.wikipedia.org', 'all-access', 'all-agents'], ['2015', 'Copa', 'América', 'en.wikipedia.org', 'all-access', 'all-agents'], ['2016', 'Summer', 'Olympics', 'en.wikipedia.org', 'all-access', 'all-agents'], ['2018', 'FIFA', 'World', 'Cup', 'en.wikipedia.org', 'all-access', 'all-agents'], ['2014', 'FIFA', 'World', 'Cup', 'en.wikipedia.org', 'mobile-web', 'all-agents'], ['A', 'Song', 'of', 'Ice', 'and', 'Fire', 'en.wikipedia.org', 'desktop', 'all-agents']]

What I actually want is something like

[['2014 FIFA World Cup', 'en.wikipedia.org', 'all-access', 'all-agents'], ['2015 Copa América', 'en.wikipedia.org', 'all-access', 'all-agents'], ['2016 Summer Olympics', 'en.wikipedia.org', 'all-access', 'all-agents'], ['2018 FIFA World Cup', 'en.wikipedia.org', 'all-access', 'all-agents'], ['2014 FIFA World Cup', 'en.wikipedia.org', 'mobile-web', 'all-agents'], ['A Song of Ice', 'and', 'Fire', 'en.wikipedia.org', 'desktop', 'all-agents']]



Solution 1:[1]

Try this:

s = "2014_FIFA_World_Cup_en.wikipedia.org_all-access_all-agents"
my_list = [v for v in s.rsplit("_",3)]
# my_list = ['2014_FIFA_World_Cup', 'en.wikipedia.org', 'all-access', 'all-agents']

It can be separated, but it makes the trick. And you just have to do it for each srting in the column (and append it).

EDIT: This is valid as long as you always have in the end: en.wikipedia.org_all-access_all-agents.

Solution 2:[2]

If any of the column value is a known/fixed value, then following snippet will work

lines = long_string.split(" ")
print(lines)

columns = []
for line in lines:
    column = []
    a = line.split("_en.wikipedia.org_")
    column.append(a[0].replace('_', ' '))
    column.append("en.wikipedia.org")
    column.extend(a[1].split("_"))
    columns.append(column)
    
print(columns)

Here en.wikipedia.org considered as common for all columns

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Dhinsha Mahesh