'How to fix Java args not getting Japanese characters properly in string from Windows Explorer?

On Windows 10, I have a shortcut file in the "SendTo" directory. It is a shortcut to a .bat file.

Inside the .bat file can have just the command "python <filepath> %*" or "java -jar <filepath> %*".

When I select and right click file(s) from Windows Explorer and have it sent to this shortcut file, it will run the program from <filepath> with the selected file(s) as arguments.

I am trying to send files with filenames containing Japanese characters as arguments. The filenames are passed to python programs just fine, but for Java programs, the args for the filenames are messed up and the Java program cannot find the file.

For example, in Java and with locale of Japan, a filename of Filename ファイル名.txt becomes Filename 繝輔ぃ繧、繝ォ蜷�.txt in the args. Other locales also do not work. The result is the same if I send the args to python and then from python to Java.

How to make it so Java gets the proper filename or can find the file properly?



Solution 1:[1]

You are encountering an unresolved issue with Java. See open bug JDK-8124977 cmdline encoding challenges on Windows which consolidates several problems related to passing Unicode arguments to a Java application from the command line.

Java 18 (to be released next month) resolves some UTF-8 issues with the implementation of JEP 400: UTF-8 by Default, but specifically not your problem unfortunately. From the "Goals" for JEP400:

  1. Standardize on UTF-8 throughout the standard Java APIs, except for console I/O. [Emphasis mine]

However, there is a workaround. See Netbeans Chinese characters in java project properties run arguments, and in particular this answer which successfully processes Chinese characters passed as command line arguments using JNA (Java Native Access). From that answer:

JNA allows you to invoke Windows API methods from Java, without using native code. So in your Java application you can call Win API methods such as GetCommandLineW() and CommandLineToArgvW() directly, to access details about the command line used to invoke your program, including any arguments passed. Both of those methods support Unicode.

So the code in that answer does not read the arguments passed to main() directly. Instead it uses JNA to invoke the Win API methods to access them.

While that code was processing Chinese characters passed as arguments from the command line, it would work just as well for Japanese characters, including your Japanese filenames.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 skomisa