'Looking for the best practice for doing a bunch of replaces on a (possibly) large string in C#

we have a piece of code that basically reads a template from a file, and then replaces a bunch of placeholders with real data. The template is something outside of the developer's control and we have noticed that sometimes (for large template files), it can get quite CPU-intensive to perform the replaces.

At least, we believe that it is the string replaces that are intensive. Debugging locally shows that it only takes milliseconds to perform each replace, but every template is different and some templates can contain hundreds of these tags that need replacing.

I'll show a little bit of code first, before I continue. This is a huge simplification of the real code. There could be hundreds of replacements happening on our real code.

string template = File.ReadAllText("path\to\file");
if (!string.IsNullOrEmpty(template))
{
  if (template.Contains("[NUMBER]"))
     template = template.Replace("[NUMBER]", myobject.Number);

  if (template.Contains("[VAT_TABLE]"))
     template = template.Replace("[VAT_TABLE]", ConstructVatTable(myObject));

  // etc ... : 
}

private string ConstructVatTable(Invoice myObject)
{
 string vatTemplate = "this is a template with tags of its own";

 StringBuilder builder = new StringBuilder();

 foreach (var item in myObject.VatItems)
 {
   builder.Append(vatTemplate.Replace("[TAG1]", item.Tag1).Replace("[TAG2]", item.Tag2);
 }
 
 return builder.ToString();
}

Is this the most optimal way of replacing parts of a large string, or are there better ways? Are there ways that we could profile what we are doing in more detail to show us where the CPU intensive parts may lie? Any help or advice would be greatly appreciated.



Solution 1:[1]

You perhaps need to come up with some alternative strategies for your replacements and race your horses.

I did 4 here and benched them:

[MemoryDiagnoser]
[SimpleJob(RuntimeMoniker.Net50)]
public class Streplace
{
    private string _doc;
    private Dictionary<string,string> _tags;


    [GlobalSetup]
    public void GenerateDocAndTags()
    {
        var sb = new StringBuilder(50000);

        _tags = new Dictionary<string, string>();
        for (int i = 0; i < 20; i++)
        {
            _tags["TAG" + i] = new string((char)('a' + i), 1000);
        }

        for (int i = 0; i < 1000; i++)
        {
            sb.Append(Guid.NewGuid().ToString().ToUpper());
            if (i % 50 == 0)
            {
                sb.Append("[TAG" + i / 50 + "]");
            }
        }

        _doc = sb.ToString();
    }

    [Benchmark]
    public void NaiveString()
    {
        var str = _doc;
        foreach (var tag in _tags)
        {
            str = str.Replace("[" + tag.Key + "]", tag.Value);
        }
    }

    [Benchmark]
    public void NaiveStringBuilder()
    {
        var strB = new StringBuilder(_doc, _doc.Length * 2);
        foreach (var tag in _tags)
        {
            strB.Replace("[" + tag.Key + "]", tag.Value);
        }
        var s = strB.ToString();
    }

    [Benchmark]
    public void StringSplit()
    {
        var strs = _doc.Split('[',']');
        for (int i = 1; i < strs.Length; i+= 2)
        {
            strs[i] = _tags[strs[i]];
        }
        var s = string.Concat(strs);
    }

    [Benchmark]
    public void StringCrawl()
    {
        var strB = new StringBuilder(_doc.Length * 2);
        var str = _doc;

        var lastI = 0;
        for (int i = str.IndexOf('['); i > -1; i = str.IndexOf('[', i))
        {
            strB.Append(str, lastI, i - lastI); //up to the [
            i++;
            var j = str.IndexOf(']', i);

            var tag = str[i..j];
            strB.Append(_tags[tag]);
            lastI = j + 1;
        }
        strB.Append(str, lastI, str.Length - lastI);
        var s = strB.ToString();
    }

}
  • NaiveString - your replace replace replace approach
  • NaiveStringBuilder - Rafal's approach - I was quite surprised how badly this performed, but I haven't looked into why. If anyone notices a glaring error in my code, let me know
  • StringSplit - the approach I commented - split the string and then the odd indexes are what needs swapping out, then join the string again
  • StringCrawl - travel the string looking for [ and ] putting either the doc content, or a tag content into the StringBuilder, depending on whether we're inside or outside the brackets

The test document was generated; thousands of GUIDs with [TAGx] (x was 0 to 19) inserted at regular intervals. The [TAGx] were each replaced with a string considerably longer that was a bunch of repeated chars. The resulting document was 56000 chars and looked like:

enter image description here

The results were:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1526 (21H2)
Intel Core i7-7820HQ CPU 2.90GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.200
  [Host]   : .NET 5.0.12 (5.0.1221.52207), X64 RyuJIT  [AttachedDebugger]
  .NET 5.0 : .NET 5.0.12 (5.0.1221.52207), X64 RyuJIT

Job=.NET 5.0  Runtime=.NET 5.0

|             Method |       Mean |    Error |   StdDev |    Gen 0 |    Gen 1 |    Gen 2 | Allocated |
|------------------- |-----------:|---------:|---------:|---------:|---------:|---------:|----------:|
|        NaiveString | 1,266.9 us | 24.73 us | 39.93 us | 433.5938 | 433.5938 | 433.5938 |  1,820 KB |
| NaiveStringBuilder | 3,908.6 us | 73.79 us | 93.33 us |  78.1250 |  78.1250 |  78.1250 |    293 KB |
|        StringSplit |   110.8 us |  2.15 us |  2.01 us |  34.4238 |  34.4238 |  34.4238 |    181 KB |
|        StringCrawl |   101.5 us |  1.96 us |  2.40 us |  79.9561 |  79.9561 |  79.9561 |    251 KB |

NaiveString's memory usage is huge, NaiveStringBuilder's memory is better but is 3x slower. Crawl and Split are pretty good - about 12x faster than NaiveString, 40x faster than NaiveStringBuilder (and a bit less memory too, bonus).

I thought the Crawl would be better than the Split, but I'm not sure the 10% speedup is worth the extra memory/collections - that would be your call. Take the code, maybe add some more approaches, and race them. Nuget install BenchmarkDotNet, put a static void main of this:

        public static async Task Main()
        {
#if DEBUG
            var sr = new Streplace();
            sr.GenerateDocAndTags();
            sr.NaiveString();
            sr.NaiveStringBuilder();
            sr.StringSplit();
            sr.StringCrawl();
#else
            var summary = BenchmarkRunner.Run<Streplace>();
#endif
        }

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1