'Within a pandas DF, how can I snag last two parts of a list as a single string for conditional output?
I'm doing some modification to a CSV via pandas. For one of the situations, I want to use parse a URL into a list, grab the last two items of that list, and output a string combining those two elements. I want to do this using a single line of code that I can insert inside a np.where situation.
For example, in the csv I have the url: "https://companymax/servicecards/city/ININ0000085013D/1664645.TIF". I would like to output a string of "ININ0000085013D_1664645.TIF". so far I managed to get part of the way there with:
from urllib.parse import parse
testurl = "https://companymax/servicecards/city/ININ0000085013D/1664645.TIF"
print(urlparse(testurl).path[1:].split('/')[2:])
However, I need that urlparse line to give the string output in a format that I can push into a np.where statement like in the below where x is the string from the above.
import pandas
import numpy as np
svc_df = pandas.read_csv(r"\\fileloc\ServiceLines.txt",
usecols=['Location', 'URLName', 'createdate'],
dtype={'Location':'string', 'URLName':'string'},
parse_dates=['createdate'])
# Create FieldNote column based on URLName
svc_df['FieldNote'] = np.where(svc_df['URLName'].str.contains('servicecards'), x, svc_df['URLName'].apply(lambda x: x[x.rfind('/')+1:]))
I also feel like I'm getting lost in the weeds here and there may be a simpler way to do this? I'm trying to basically create the FieldNote column based on URLName where it takes the file name (after last /) unless the URLName contains 'servicecards' (those are only ones that have duplicates) in which case I want the subfolder name + file name.
Solution 1:[1]
As an alternative you could use Pandas apply function describing a similar behavior to the where command.
def get_field(d):
s = d.rsplit('/',2)
if 'servicecards' in d:
return '_'.join(s[-2:])
return s[-1]
df['FieldNote'] = df['URLName'].apply(get_field)
print(df)
Output from df
URLName FieldNote
0 https://companymax/servicecards/city/ININ0000085013D/1664645.TIF ININ0000085013D_1664645.TIF
1 https://companymax/otherstring/city/ININ0000085013E/1664646.TIF 1664646.TIF
2 https://companymax/otherstring/city/ININ0000085013F/1664647.TIF 1664647.TIF
3 https://companymax/otherstring/city/ININ0000085013G/1664648.TIF 1664648.TIF
4 https://companymax/servicecards/city/ININ0000085013H/1664649.TIF ININ0000085013H_1664649.TIF
5 https://companymax/servicecards/city/ININ0000085013I/1664650.TIF ININ0000085013I_1664650.TIF
6 https://companymax/otherstring/city/ININ0000085013J/1664651.TIF 1664651.TIF
7 https://companymax/servicecards/city/ININ0000085013K/1664652.TIF ININ0000085013K_1664652.TIF
8 https://companymax/otherstring/city/ININ0000085013L/1664653.TIF 1664653.TIF
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | n1colas.m |
