'nant replace regex in file while preserving encoding

I'm using a Nant build to update the dates in C# AssemblyInfo.cs files (lots of them). Each file contains a line like...

[assembly: AssemblyCopyright("Copyright Whoever 2020-2021")]

or

[assembly: AssemblyCopyright("Copyright Whoever 2021")]

and I'm updating that to say

[assembly: AssemblyCopyright("Copyright Whoever 2020-2022")]

or

[assembly: AssemblyCopyright("Copyright Whoever 2021-2022")]

I have a property versionFilePath which contains the name of the file, and I'm doing...

  <loadfile file="${versionFilePath}" property="versionFileContent"/>
  <regex pattern="^(?'prefix'\[assembly:\s+AssemblyCopyright\(&quot;.+?)(?'fromdate'\d\d\d\d)(?'todate'\-\d\d\d\d)?(?'suffix'.*&quot;\)\])" input="${versionFileContent}" options="Multiline" />

  <loadfile file="${versionFilePath}" property="versionFileContent">
    <filterchain>
      <replacestring from="${prefix}${fromdate}${todate}${suffix}" to="${prefix}${fromdate}-${datetime::get-year(datetime::now())}${suffix}" />
    </filterchain>
  </loadfile>
  <echo file="${versionFilePath}">${versionFileContent}</echo>

And this is basically working, however the file that it is writing is a different encoding than the one which it loaded; and the version control system we're using doesn't like that very much.

How can I make it do the replacement without altering the encoding? Can I capture the encoding when the file is being loaded so that I can use the same value when the file is being written? Or is there a better way of doing this, where I can just do the Regex replace directly on the file?



Solution 1:[1]

I'm not aware of a solution which keeps the original encoding, but you might force the echo task to a defined encoding (one that is accepted by your VCS) since echo has an encoding attribute (as least as of version 0.92).

  <echo file="${versionFilePath}" encoding="iso-8859-1">${versionFileContent}</echo>

Update: To make it a bit clearer: There's no good way of telling from a source text file what its encoding is. You can make a good guess (take a look at Python module chardet), but most of the time everything depends on meta-information. My advice is:

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1