'Code Treats .txt File Differently When Saved

I have an input .txt file that looks something like this.

command1 param1
command2       param2
command3       param3
command4 param4

I am trying to reduce the extra whitespace so I implemented the code below to remove that.

string[] output = File.ReadAllText(InputFilePath).Split('\n').Select(s => Regex.Replace(s, @"\s+", " ")).ToArray();

File.WriteAllLines(OutputFilePath, output);

If I run the code on the file without doing anything, the code does not work.

However, If I manually go into the input file and just save it without changing anything and then run the code again, it works fine.

I believe this is some sort of UTF-16/8 issue but I am not sure how to account for it. What can I do?



Solution 1:[1]

In this specific case there were "invisible control characters and unused code points". Using regular expressions to remove those characters resolved the issue.

string[] output = File.ReadAllLines(InputFilePath).Select(s => Regex.Replace(s, @"\p{C}+", "")).ToArray();

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Niuq Navig