'Alternative for nested loop operation in python?

I want a fast alternative of a nested loop operation in which the second loop occurs after some operation in first loop.

For example:

date = target_date_list = pd.date_range(start=start_date, end=end_date).strftime(f'year=%Y/month=%m/day=%d')

for date in target_date_list:
    folder = f'path_to_folder/{date}'
    for file in folder:
        //some operation


Solution 1:[1]

There is no meaningfully faster alternative here. The inner loop's values are dependent on the value generated by the outer loop, so the micro-optimization of using itertools.product isn't available.

If you're actually iterating a directory (not characters in a string describing a directory), I'd strongly recommend using os.scandir over os.listdir (assuming like many folks you were using the latter without knowing the former existed), as it's much faster when:

  1. You're operating on large directories
  2. You're filtering the contents based on stat info (in particular entry types, which come for free without a stat at all; on Windows, you get even more for free, and anywhere else if you do stat, it's cached on the entry so you can check multiple results without triggering a re-stat)

With os.scandir, and inner loop previously implemented like:

for file in os.listdir(dir):
    path = os.path.join(dir, file)
    if file.endswith('.txt') and os.path.isfile(path) and os.path.getsize(path) > 4096:
        # do stuff with 4+KB file described by "path"

can simplify slightly and speed up by changing to:

with os.scandir(dir) as direntries:
    for entry in direntries:
        if entry.name.endswith('.txt') and entry.is_file() and entry.stat().st_size >= 4096:
        # do stuff with 4+KB file described by "entry.path"

but fundamentally, this optimization has nothing to do with avoiding nested loops; if you want to iterate all the files, you have to iterate all the files. A nested loop will need to occur somehow even if you hide it behind utility methods, and the cost will not be meaningful relative to the cost of file system access.

Solution 2:[2]

As a rule of thumb, your best bet for better performance in a for loop is to use a generator expression. However, I suspect that the performance boost for your particular example will be minimal, since your outer loop is just a trivial task of assigning a variable to a string.

date = target_date_list = pd.date_range(start=start_date, end=end_date).strftime(f'year=%Y/month=%m/day=%d')

for file in (f'path_to_folder/{date}' for date in target_date_list):
    //some operation

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 jfaccioni