'nant replace regex in file while preserving encoding
I'm using a Nant build to update the dates in C# AssemblyInfo.cs files (lots of them). Each file contains a line like...
[assembly: AssemblyCopyright("Copyright Whoever 2020-2021")]
or
[assembly: AssemblyCopyright("Copyright Whoever 2021")]
and I'm updating that to say
[assembly: AssemblyCopyright("Copyright Whoever 2020-2022")]
or
[assembly: AssemblyCopyright("Copyright Whoever 2021-2022")]
I have a property versionFilePath which contains the name of the file, and I'm doing...
<loadfile file="${versionFilePath}" property="versionFileContent"/>
<regex pattern="^(?'prefix'\[assembly:\s+AssemblyCopyright\(".+?)(?'fromdate'\d\d\d\d)(?'todate'\-\d\d\d\d)?(?'suffix'.*"\)\])" input="${versionFileContent}" options="Multiline" />
<loadfile file="${versionFilePath}" property="versionFileContent">
<filterchain>
<replacestring from="${prefix}${fromdate}${todate}${suffix}" to="${prefix}${fromdate}-${datetime::get-year(datetime::now())}${suffix}" />
</filterchain>
</loadfile>
<echo file="${versionFilePath}">${versionFileContent}</echo>
And this is basically working, however the file that it is writing is a different encoding than the one which it loaded; and the version control system we're using doesn't like that very much.
How can I make it do the replacement without altering the encoding? Can I capture the encoding when the file is being loaded so that I can use the same value when the file is being written? Or is there a better way of doing this, where I can just do the Regex replace directly on the file?
Solution 1:[1]
I'm not aware of a solution which keeps the original encoding, but you might force the echo task to a defined encoding (one that is accepted by your VCS) since
echo has an encoding attribute (as least as of version 0.92).
<echo file="${versionFilePath}" encoding="iso-8859-1">${versionFileContent}</echo>
Update: To make it a bit clearer: There's no good way of telling from a source text file what its encoding is. You can make a good guess (take a look at Python module chardet), but most of the time everything depends on meta-information. My advice is:
- Get your source
CommonAssemblyInfo.csto a shared encoding e.g., UTF-8 with BOMs. - This is highly experimental: You could use the same
encodingattribute forloadfileandechotask. If you take a look at the NAnt sourcesloadfileencoding attribute defaults to Encoding.Default (which is pretty straightforward and according to the doc meansUnless an encoding is specified, the encoding associated with the system's current ANSI code page is used). According to the NAnt sources theechotask defaults to UTF-8. If you choose ASCII encoding for both, you might be safe for most of 8-Bit-encodings since you're replacing characters with ANSI code < 128 only.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
