'How can I loop through a string from the end to split it to several strings
2014_FIFA_World_Cup_en.wikipedia.org_all-access_all-agents 2015_Copa_América_en.wikipedia.org_all-access_all-agents 2016_Summer_Olympics_en.wikipedia.org_all-access_all-agents 2018_FIFA_World_Cup_en.wikipedia.org_all-access_all-agents 2014_FIFA_World_Cup_en.wikipedia.org_mobile-web_all-agents A_Song_of_Ice_and_Fire_en.wikipedia.org_desktop_all-agents
I have a column in my dataset with the above as some of the observations from that column. I'm trying to separate the column into 4 columns such that I have one column as 2014_FIFA_World_Cup, 2015_Copa_América, 2016_Summer_Olympics; another column as en.wikipedia.org, en.wikipedia.org, en.wikipedia.org; another as all-access, mobile-web, desktop.
I've tried the following
long_string = """2014_FIFA_World_Cup_en.wikipedia.org_all-access_all-agents
2015_Copa_América_en.wikipedia.org_all-access_all-agents
2016_Summer_Olympics_en.wikipedia.org_all-access_all-agents
2018_FIFA_World_Cup_en.wikipedia.org_all-access_all-agents
2014_FIFA_World_Cup_en.wikipedia.org_mobile-web_all-agents
A_Song_of_Ice_and_Fire_en.wikipedia.org_desktop_all-agents"""
lines = long_string.split("\n")
columns = [line.split("_") for line in lines]
print(columns)
Got the following result:
[['2014', 'FIFA', 'World', 'Cup', 'en.wikipedia.org', 'all-access', 'all-agents'], ['2015', 'Copa', 'América', 'en.wikipedia.org', 'all-access', 'all-agents'], ['2016', 'Summer', 'Olympics', 'en.wikipedia.org', 'all-access', 'all-agents'], ['2018', 'FIFA', 'World', 'Cup', 'en.wikipedia.org', 'all-access', 'all-agents'], ['2014', 'FIFA', 'World', 'Cup', 'en.wikipedia.org', 'mobile-web', 'all-agents'], ['A', 'Song', 'of', 'Ice', 'and', 'Fire', 'en.wikipedia.org', 'desktop', 'all-agents']]
What I actually want is something like
[['2014 FIFA World Cup', 'en.wikipedia.org', 'all-access', 'all-agents'], ['2015 Copa América', 'en.wikipedia.org', 'all-access', 'all-agents'], ['2016 Summer Olympics', 'en.wikipedia.org', 'all-access', 'all-agents'], ['2018 FIFA World Cup', 'en.wikipedia.org', 'all-access', 'all-agents'], ['2014 FIFA World Cup', 'en.wikipedia.org', 'mobile-web', 'all-agents'], ['A Song of Ice', 'and', 'Fire', 'en.wikipedia.org', 'desktop', 'all-agents']]
Solution 1:[1]
Try this:
s = "2014_FIFA_World_Cup_en.wikipedia.org_all-access_all-agents"
my_list = [v for v in s.rsplit("_",3)]
# my_list = ['2014_FIFA_World_Cup', 'en.wikipedia.org', 'all-access', 'all-agents']
It can be separated, but it makes the trick. And you just have to do it for each srting in the column (and append it).
EDIT: This is valid as long as you always have in the end: en.wikipedia.org_all-access_all-agents.
Solution 2:[2]
If any of the column value is a known/fixed value, then following snippet will work
lines = long_string.split(" ")
print(lines)
columns = []
for line in lines:
column = []
a = line.split("_en.wikipedia.org_")
column.append(a[0].replace('_', ' '))
column.append("en.wikipedia.org")
column.extend(a[1].split("_"))
columns.append(column)
print(columns)
Here en.wikipedia.org considered as common for all columns
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Dhinsha Mahesh |
