'`os.path.getsize()` slow on Network Drive (Python, Windows)

I have a program that iterates over several thousand PNG files on an SMB shared network drive (a 2TB Samsung 970 Evo+) and adds up their individual file sizes. Unfortunately, it is very slow. After profiling the code, it turns out 90% of the execution time is spent on one function:

filesize += os.path.getsize(png)

where each png variable is the filepath to a single PNG file (of the several thousands) in a for loop that iterates over each one obtained from glob.glob() (which, to compare, is responsible for 7.5% of the execution time).

enter image description here

The code can be found here: https://pastebin.com/SsDCFHLX

Clearly there is something about obtaining the filesize over the network that is extremely slow, but I'm not sure what. Is there any way I can improve the performance? It takes just as long using filesize += os.stat(png).st_size too.

When the PNG files are stored on the computer locally, the speed is not an issue. It specifically becomes a problem when the files are stored on another machine that I access over the local network with a gigabit ethernet cable. Both are running Windows 10.



Solution 1:[1]

You Can Try Getting Path From pathlib

from pathlib import Path

# Build paths inside the project like this: BASE_DIR / 'subdir'.
BASE_DIR = Path(__file__).resolve().parent.parent

If This Did Not Help, Learn More About This Python Module [PathLib]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 DevDiv