'Python, Pandas: Faster File Search than os.path?

I have a pandas df with file names that need to be searched/matched in a directory tree.

I've been using the following but it crashes with larger directory structures. I record whether or not they are present in 2 lists.

found = []
missed = []

for target_file in df_files['Filename']:
    
    for (dirpath, dirnames, filenames) in os.walk(DIRECTORY_TREE):
        if target_file in filenames:
            found.append(os.path.join(dirpath,target_file))
        else:
            missed.append(target_file)
print('Found: ',len(found),'Missed: ',len(missed))
print(missed)

I've read that scandir is quicker and will handle larger directory trees. If true, how might this be rewritten?

My attempt:

found = []
missed = []

for target_file in df_files['Filename']:
    
    for item in os.scandir(DIRECTORY_TREE):
        if item.is_file() and item.name() == target_file:
            found.append(os.path.join(dirpath,target_file))
        else:
            missed.append(target_file)
            
print('Found: ',len(found),'Missed: ',len(missed))
print(missed)

This runs (fast), but everything ends up in the "missed" list.



Solution 1:[1]

You've used :

    @foreach($data as $data)

whereas you should use :

@foreach($data as $datum)
    <tr>
       <td>{{ $i++}}</td>
       <td class="col-md-3">{{ $datum->ip }} </td>
       <td class="col-md-3">{{ $datum->count() }} </td>
       <td class="col-md-3">{{ $datum->url }} </td>
       <td class="col-md-3">{{ $datum->city }} </td>
       <td class="col-md-3">{{ $datum->state }} </td>
       <td class="col-md-3">{{ $datum->created_at->format('d M Y') }} </td>
    </tr>
@endforeach

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Giles Bennett