'How can I split the document path to the foldername and the document name in python?

I need to split the document path to the foldername and the document name in python. It is a large dataframe including many rows.For the filename with no document name followed, just leave the document name column blank in the result. For example, I have a dataframe like the follows:

     no  filename
     1  \\apple\config.csv
     2  \\apple\fox.pdf
     3  \\orange\cat.xls
     4  \\banana\eggplant.pdf
     5  \\lucy
...

I expect the output shown as follows:

    foldername  documentname
    \\apple     config.csv
    \\apple     fox.pdf
    \\orange    cat.xls
    \\banana    eggplant.pdf
    \\lucy 
...     

I have tried the following code,but it does not work.


    y={'Foldername':[],'Docname':[]}
    def splitnames(x):
        if "." in x:
            docname=os.path.basename(x)
            rm="\\"+docname
            newur=x.replace(rm,'')
        else:
            newur=x
            docname=""
        result=[newur,docname]
        y["Foldername"].append(result[0])
        y["Docname"].append(result[1])
        return y;

    dff=df$filename.apply(splitnames)

Thank you so much for the help!!



Solution 1:[1]

Not sure how you're getting the paths, but you could create some Pathlib objects and use some class methods to grab the file name and folder name.

:

from pathlib import Path

data = """ no  filename
     1  \\apple\\config.csv
     2  \\apple\\fox.pdf
     3  \\orange\\cat.xls
     4  \\banana\\eggplant.pdf
     5  \\lucy"""

df = pd.read_csv(StringIO(data),sep='\s+')
df['filename'] = df['filename'].apply(Path)


df['folder'] = df['filename'].apply(lambda x : x.parent if '.' in x.suffix else x)
df['document_name'] = df['filename'].apply(lambda x : x.name if '.' in x.suffix  else np.nan)


print(df)

   no              filename   folder document_name
0   1     \apple\config.csv   \apple    config.csv
1   2        \apple\fox.pdf   \apple       fox.pdf
2   3       \orange\cat.xls  \orange       cat.xls
3   4  \banana\eggplant.pdf  \banana  eggplant.pdf
4   5                 \lucy    \lucy           NaN

Solution 2:[2]

Possibly, you shall use apply function twice, to generate separate columns:

import pandas as pd
filenames = [r'\\apple\config.csv', r'\\apple\fox.pdf', r'\\orange\cat.xls', r'\\banana\eggplant.pdf']
df = pd.DataFrame( { 'filename':filenames })
df['Foldername'] = df['filename'].apply( lambda x : r'\\' + x.split('\\')[-2]  )
df['Docname'] = df['filename'].apply( lambda x :  x.split('\\')[-1]  )

Default apply function awaits single value to be created and also in this case it is worth to indicate to which column you want to use it.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

Solution 3:[3]

Extension to Umar.H suggestion is to use split under the os lib

df['Docname'] = df['filename'].apply(lambda x : os.path.split(x)[1])

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 RunTheGauntlet
Solution 3 rpb