'Python 3.10: FileNotFoundError - Existing Path With Unicode Characters
Problem statement:
While automatically copying files between input directories, and output directories my program fails on a path that contains unicode (most likely Korean) characters.
The whole script is publicly available under: This Link
The file that causes the error is also publicly available: File That Causes the Error
The specific part of the code that fails seems to be:
for root, _, filenames in os.walk(maybe_dir):
for file in filenames:
# Prepare relative paths:
relative_dir = os.path.relpath(root, maybe_dir)
relative_file = os.path.join(relative_dir, file)
# Get unique filename:
unique_filename = uuid.uuid4().hex
unique_filename_with_ext = unique_filename + file_extension
new_path_and_filename = os.path.join(
full_output_path, unique_filename_with_ext
)
current_file = os.path.abspath(os.path.join(root, file))
# Copying files:
shutil.copy(current_file, new_path_and_filename)
The error:
Traceback (most recent call last):
File "F:\Projects\SC2DatasetPreparator\src\directory_flattener.py", line 96, in <module>
directory_flattener(
File "F:\Projects\SC2DatasetPreparator\src\directory_flattener.py", line 60, in directory_flattener
shutil.copy(current_file, new_path_and_filename)
File "D:\Programs\Python3_10\lib\shutil.py", line 417, in copy
copyfile(src, dst, follow_symlinks=follow_symlinks)
File "D:\Programs\Python3_10\lib\shutil.py", line 254, in copyfile
with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: 'F:\\Projects\\SC2DatasetPreparator\\processing\\directory_flattener\\input\\2017_IEM_XI_World_Championship_Katowice\\IEM XI - World Championship - StarCraft II Replays\\RO24\\Group A\\Solar Vs herO\\Ùë¦ý+ñÝü¼ ý×¼Û¦£Ù¦£ ýºÇÛÁ¼ - ÝåáÙäêÙ¿+Ýè© (Û¦ÁÝùêýØÿ ý£áýé¦) (2) Solar vs Hero game 1.SC2Replay'
The error itself is unexpected as I am using absolute paths and the script works for 6 other directories before it fails on that specific file.
The path clearly exists and can be accessed manually:

Steps to attempt to reproduce the error are as follows:
- Clone the repository: Branch 1.1.0_testing
- Place the File That Causes the Error in
./processing/directory_flattener/input/test_dir - Run the script
Closing Remarks:
It seems that the script worked before on Python 3.7 because I have verified the output that I have received before updating to Python 3.10 and within the directory mapping that is created the files with unicode characters in their path are present:
{
"ce2f4610891e472190a0852c617b35e8": "RO24\\Group A\\Solar Vs herO\\\u00d9\u00eb\u00a6\u00fd+\u00f1\u00dd\u00fc\u00bc \u00fd\u00d7\u00bc\u00db\u00a6\u00a3\u00d9\u00a6\u00a3 \u00fd\u00ba\u00c7\u00db\u00c1\u00bc - \u00dd\u00e5\u00e1\u00d9\u00e4\u00ea\u00d9\u00bf+\u00dd\u00e8\u00a9 (\u00db\u00a6\u00c1\u00dd\u00f9\u00ea\u00fd\u00d8\u00ff \u00fd\u00a3\u00e1\u00fd\u00e9\u00a6) (2) Solar vs Hero game 1.SC2Replay",
"dcc82d633910479c95d06ef418fcf2e0": "RO24\\Group A\\Solar Vs herO\\\u00fd\u00fb\u00a6\u00d9\u00a6\u00e4\u00fd\u00e4\u00f1 \u00d9\u00aa\u00bc\u00dd\u00f6\u00e4 - \u00d9\u00d7\u00ff\u00d9\u00ec\u00f6 Solar vs Hero game 2.SC2Replay",
}
While searching for an answer I have only stumbled upon similar problems in Python 2 where the .decode() method was suggested as a solution. Applying such measures did not help to solve the issue.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
