'How to search through a pandas dataframe for a match?

I got the following pandas dataframe:

                                                  paths tags
0     /home/onur/PycharmProjects/file-tagging/data/w...  NaN
1     /home/onur/PycharmProjects/file-tagging/data/d...  NaN
2     /home/onur/PycharmProjects/file-tagging/data/d...  NaN
3     /home/onur/PycharmProjects/file-tagging/data/d...  NaN
4     /home/onur/PycharmProjects/file-tagging/data/w...  NaN
...                                                 ...  ...
5404  /home/onur/PycharmProjects/file-tagging/.idea/...  NaN
5405  /home/onur/PycharmProjects/file-tagging/.idea/...  NaN
5406  /home/onur/PycharmProjects/file-tagging/.idea/...  NaN
5407  /home/onur/PycharmProjects/file-tagging/.idea/...  NaN
5408  /home/onur/PycharmProjects/file-tagging/conten...  NaN

I want to iterate through every row in paths and if the entry meets a specific requirement, I want to add to the corresponding row in tags. A single tags row could have more than one entry. What is a good way to strategize this?

My goal is to also save this to a data file to be able to edit later on.

This is my code so far:

data_list = []


def myprint(path):
    if not os.path.exists('content-log.txt'):
        open('content-log.txt', 'a').close()

    elements = os.listdir(path)

    for element in elements:
        if os.path.isdir(os.path.join(path, element)):
            myprint(os.path.join(path, element))
        else:
            data_list.append(os.path.join(path, element))

df = pandas.DataFrame(data_list, columns=['paths'])
df['tags'] = pd.Series(dtype='str')
print(df)


Solution 1:[1]

Advice

  1. Use the walk function instead of a loop within a isdir judge.

Example

data_list = []
for root, dirs, files in os.walk(path):
    data_list.extend([os.path.join(root, file) for file in files])
  1. Use a judge function to return your specific requirement.

Example

def judge(elem):
    return [elem, "Ok"] if elem.endswith("json") else ["No-way"]
df["tags"] = df["value"].apply(judge)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 FavorMylikes