'Python 3.10: FileNotFoundError - Existing Path With Unicode Characters

Problem statement:

While automatically copying files between input directories, and output directories my program fails on a path that contains unicode (most likely Korean) characters.

The whole script is publicly available under: This Link

The file that causes the error is also publicly available: File That Causes the Error

The specific part of the code that fails seems to be:

for root, _, filenames in os.walk(maybe_dir):
    for file in filenames:
        # Prepare relative paths:
        relative_dir = os.path.relpath(root, maybe_dir)
        relative_file = os.path.join(relative_dir, file)

        # Get unique filename:
        unique_filename = uuid.uuid4().hex
        unique_filename_with_ext = unique_filename + file_extension
        new_path_and_filename = os.path.join(
            full_output_path, unique_filename_with_ext
        )

        current_file = os.path.abspath(os.path.join(root, file))

        # Copying files:
        shutil.copy(current_file, new_path_and_filename)

The error:

Traceback (most recent call last):
  File "F:\Projects\SC2DatasetPreparator\src\directory_flattener.py", line 96, in <module>      
    directory_flattener(
  File "F:\Projects\SC2DatasetPreparator\src\directory_flattener.py", line 60, in directory_flattener
    shutil.copy(current_file, new_path_and_filename)
  File "D:\Programs\Python3_10\lib\shutil.py", line 417, in copy
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "D:\Programs\Python3_10\lib\shutil.py", line 254, in copyfile
    with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: 'F:\\Projects\\SC2DatasetPreparator\\processing\\directory_flattener\\input\\2017_IEM_XI_World_Championship_Katowice\\IEM XI - World Championship - StarCraft II Replays\\RO24\\Group A\\Solar Vs herO\\Ùë¦ý+ñÝü¼ ý×¼Û¦£Ù¦£ ýºÇÛÁ¼ - ÝåáÙäêÙ¿+Ýè© (Û¦ÁÝùêýØÿ ý£áýé¦) (2) Solar vs Hero game 1.SC2Replay'

The error itself is unexpected as I am using absolute paths and the script works for 6 other directories before it fails on that specific file.

The path clearly exists and can be accessed manually: Existing Path

Steps to attempt to reproduce the error are as follows:

  1. Clone the repository: Branch 1.1.0_testing
  2. Place the File That Causes the Error in ./processing/directory_flattener/input/test_dir
  3. Run the script

Closing Remarks:

It seems that the script worked before on Python 3.7 because I have verified the output that I have received before updating to Python 3.10 and within the directory mapping that is created the files with unicode characters in their path are present:

{
"ce2f4610891e472190a0852c617b35e8": "RO24\\Group A\\Solar Vs herO\\\u00d9\u00eb\u00a6\u00fd+\u00f1\u00dd\u00fc\u00bc \u00fd\u00d7\u00bc\u00db\u00a6\u00a3\u00d9\u00a6\u00a3 \u00fd\u00ba\u00c7\u00db\u00c1\u00bc - \u00dd\u00e5\u00e1\u00d9\u00e4\u00ea\u00d9\u00bf+\u00dd\u00e8\u00a9 (\u00db\u00a6\u00c1\u00dd\u00f9\u00ea\u00fd\u00d8\u00ff \u00fd\u00a3\u00e1\u00fd\u00e9\u00a6) (2) Solar vs Hero game 1.SC2Replay",
"dcc82d633910479c95d06ef418fcf2e0": "RO24\\Group A\\Solar Vs herO\\\u00fd\u00fb\u00a6\u00d9\u00a6\u00e4\u00fd\u00e4\u00f1 \u00d9\u00aa\u00bc\u00dd\u00f6\u00e4 - \u00d9\u00d7\u00ff\u00d9\u00ec\u00f6 Solar vs Hero game 2.SC2Replay", 
}

While searching for an answer I have only stumbled upon similar problems in Python 2 where the .decode() method was suggested as a solution. Applying such measures did not help to solve the issue.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source