'Within a pandas DF, how can I snag last two parts of a list as a single string for conditional output?

I'm doing some modification to a CSV via pandas. For one of the situations, I want to use parse a URL into a list, grab the last two items of that list, and output a string combining those two elements. I want to do this using a single line of code that I can insert inside a np.where situation.

For example, in the csv I have the url: "https://companymax/servicecards/city/ININ0000085013D/1664645.TIF". I would like to output a string of "ININ0000085013D_1664645.TIF". so far I managed to get part of the way there with:

from urllib.parse import parse

testurl = "https://companymax/servicecards/city/ININ0000085013D/1664645.TIF"
print(urlparse(testurl).path[1:].split('/')[2:])

However, I need that urlparse line to give the string output in a format that I can push into a np.where statement like in the below where x is the string from the above.

import pandas
import numpy as np

svc_df = pandas.read_csv(r"\\fileloc\ServiceLines.txt", 
                              usecols=['Location', 'URLName', 'createdate'],
                              dtype={'Location':'string', 'URLName':'string'},
                              parse_dates=['createdate'])
# Create FieldNote column based on URLName
svc_df['FieldNote'] = np.where(svc_df['URLName'].str.contains('servicecards'), x, svc_df['URLName'].apply(lambda x: x[x.rfind('/')+1:]))

I also feel like I'm getting lost in the weeds here and there may be a simpler way to do this? I'm trying to basically create the FieldNote column based on URLName where it takes the file name (after last /) unless the URLName contains 'servicecards' (those are only ones that have duplicates) in which case I want the subfolder name + file name.



Solution 1:[1]

As an alternative you could use Pandas apply function describing a similar behavior to the where command.

def get_field(d):
    s = d.rsplit('/',2)
    if 'servicecards' in d:
        return '_'.join(s[-2:])
    return s[-1]

df['FieldNote'] = df['URLName'].apply(get_field)
print(df)

Output from df

                                                            URLName                    FieldNote
0  https://companymax/servicecards/city/ININ0000085013D/1664645.TIF  ININ0000085013D_1664645.TIF
1   https://companymax/otherstring/city/ININ0000085013E/1664646.TIF                  1664646.TIF
2   https://companymax/otherstring/city/ININ0000085013F/1664647.TIF                  1664647.TIF
3   https://companymax/otherstring/city/ININ0000085013G/1664648.TIF                  1664648.TIF
4  https://companymax/servicecards/city/ININ0000085013H/1664649.TIF  ININ0000085013H_1664649.TIF
5  https://companymax/servicecards/city/ININ0000085013I/1664650.TIF  ININ0000085013I_1664650.TIF
6   https://companymax/otherstring/city/ININ0000085013J/1664651.TIF                  1664651.TIF
7  https://companymax/servicecards/city/ININ0000085013K/1664652.TIF  ININ0000085013K_1664652.TIF
8   https://companymax/otherstring/city/ININ0000085013L/1664653.TIF                  1664653.TIF

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 n1colas.m