'Search list items (list has path that extracted using OS Walk) in a csv file. If it’s not available, then perform some action

I wanted to check each item of this list (Filepath_list) in a DataFrame (from a csv) that has list of path.

If it’s there than I will skip and check for next item in the list. If it not their than I will do some processing.

The code I am using is:

l = os.listdir(dir_path)

filepath_list = []
for root, dirs, files in os.walk(dir_path):
    for f in files:
        if 'Apr.log' in f:
            filepath_list.append(os.path.join(root,f))

df_path_list_csv = pd.read_csv(r"C:\Users\ABC6\OneDrive - ABB\Files\Folder List.csv")

for g in filepath_list:
    if df_path_list_csv['Detail'].str.contains(g).any():
        print(g)

The error I am getting is:

incomplete escape \U at position 2

Can anyone please help me with correcting the error in this code. Is it because of backlash? If yes, how can I correct it. I tried replacing "\" with "/" but still its not working..

The variable Filepath_list has "\\" while df_path_list_csv dataframe has "\"

Filepath_list = 
['C:\\Users\\ABC6\\OneDrive - ABB\\Job Files\\SA_R 1_Ret\\13Mar22\\Apr.log',
 'C:\\Users\\ABC6\\OneDrive - ABB\\Job Files\\SA_Run 2_Dri\\28Mar22\\Apr.log',
 'C:\\Users\\ABC6\\OneDrive - ABB\\Job Files\\SA_Well_Run 2_Dri\\29Mar22\\Apr.log']

df_path_list_csv:
0   C:\Users\ ABC6\OneDrive - ABB\Job Files...
1   C:\Users\ ABC6\OneDrive - ABB\ Job Files...


Solution 1:[1]

By default the contains method of pandas assume that the pattern used is a regular expression and so identify a backslash as an escape char. To avoid this behavior and use your pattern as litteral string you need to set the regex parameter to False.

l = os.listdir(dir_path)

filepath_list = []
for root, dirs, files in os.walk(dir_path):
    for f in files:
        if 'Apr.log' in f:
            filepath_list.append(os.path.join(root,f))

# As said in my comment just double the backslash to avoid the error you get
df_path_list_csv = pd.read_csv("C:\\Users\\ABC6\\OneDrive - ABB\\Files\\Folder List.csv")

for g in filepath_list:
    # Here you have to set to False the `regex` parameter of the `contains` method
    if df_path_list_csv['Detail'].str.contains(g, regex=False).any():
        print(g)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Yannick Guéhenneux