'`os.path.getsize()` slow on Network Drive (Python, Windows)
I have a program that iterates over several thousand PNG files on an SMB shared network drive (a 2TB Samsung 970 Evo+) and adds up their individual file sizes. Unfortunately, it is very slow. After profiling the code, it turns out 90% of the execution time is spent on one function:
filesize += os.path.getsize(png)
where each png variable is the filepath to a single PNG file (of the several thousands) in a for loop that iterates over each one obtained from glob.glob() (which, to compare, is responsible for 7.5% of the execution time).
The code can be found here: https://pastebin.com/SsDCFHLX
Clearly there is something about obtaining the filesize over the network that is extremely slow, but I'm not sure what. Is there any way I can improve the performance? It takes just as long using filesize += os.stat(png).st_size too.
When the PNG files are stored on the computer locally, the speed is not an issue. It specifically becomes a problem when the files are stored on another machine that I access over the local network with a gigabit ethernet cable. Both are running Windows 10.
Solution 1:[1]
You Can Try Getting Path From pathlib
from pathlib import Path
# Build paths inside the project like this: BASE_DIR / 'subdir'.
BASE_DIR = Path(__file__).resolve().parent.parent
If This Did Not Help, Learn More About This Python Module [PathLib]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | DevDiv |

