'OpenXML Replace Field Codes with Result Value

I'm trying to iterate over fields located in the header and footer of vendor generated documents using OpenXML in an effort to replace them with the result value stored in the field, then remove the field. Below is the code from the footer for just one of the several fields (imagine 4 or 5 more after this one). I'm limited to .Net 3.5 framework and OpenXML SDK 2.0 due to running this as part of an SSIS script task.

    <w:r>
        <w:rPr>
            <w:rFonts w:ascii="Segoe UI" w:hAnsi="Segoe UI" w:eastAsia="Segoe UI"/>
            <w:sz w:val="20"/>
        </w:rPr>
        <w:fldChar w:fldCharType="begin"/>
    </w:r>
    <w:r>
        <w:rPr>
            <w:rFonts w:ascii="Segoe UI" w:hAnsi="Segoe UI" w:eastAsia="Segoe UI"/>
            <w:sz w:val="20"/>
        </w:rPr>
        <w:instrText xml:space="preserve"> REF NG_MACRO "STANDARD" "patient_lname" </w:instrText>
    </w:r>
    <w:r>
        <w:rPr>
            <w:rFonts w:ascii="Segoe UI" w:hAnsi="Segoe UI" w:eastAsia="Segoe UI"/>
            <w:sz w:val="20"/>
        </w:rPr>
        <w:fldChar w:fldCharType="separate"/>
    </w:r>
    <w:r>
        <w:rPr>
            <w:rFonts w:ascii="Segoe UI" w:hAnsi="Segoe UI" w:eastAsia="Segoe UI"/>
            <w:sz w:val="20"/>
        </w:rPr>
        <w:t xml:space="preserve">Test</w:t>
    </w:r>

I've tried multiple approaches I've found through the past 3 weeks of research, but all seem to fail or only affect the first field, but not the rest.

Below is an example of what I've tried to do that seemed to work the best, but again, it's only finding the first field and ignoring the rest. Note: there is a set of page numbering fields after these vendor fields akin to / that I do not want to change.

        using (WordprocessingDocument document = WordprocessingDocument.Open("Plan.doc", true))
        {
            MainDocumentPart main = document.MainDocumentPart;

            foreach (FooterPart foot in main.FooterParts)
            {
                foreach(var fld in foot.RootElement.Descendants<FieldCode>())
                {
                    if (fld != null && fld.InnerText.Contains("REF NG_MACRO"))
                    {
                        Run rFldCode = (Run)fld.Parent;

                        // Get the three (3) other Runs that make up our merge field
                        Run rBegin = rFldCode.PreviousSibling<Run>();
                        Run rSep = rFldCode.NextSibling<Run>();
                        Run rText = rSep.NextSibling<Run>();
                        Run rEnd = rText.NextSibling<Run>();

                        // Get the Run that holds the Text element for our merge field
                        // Get the Text element and replace the text content 
                        Text t = rText.GetFirstChild<Text>();
                        //t.Text = replacementText;

                        // Remove all the four (4) Runs for our merge field
                        rFldCode.Remove();
                        rBegin.Remove();
                        rSep.Remove();
                        rEnd.Remove();
                    }
                }

                foot.Footer.Save();
            }
            document.MainDocumentPart.Document.Save();
            document.Close();
        }

I appreciate any insight and thoughts that anyone can offer on what I'm missing, a better way to achieve this with OpenXML, etc.



Solution 1:[1]

Try this. It works for me.

using (WordprocessingDocument wordDocument = WordprocessingDocument.Open("Plan.doc", true))
 {

    if (null != wordDocument)
        {

            const string FieldDelimeter = @" MERGEFIELD ";
            List<string> listeChamps = new List<string>();

            foreach (FooterPart footer in wordDocument.MainDocumentPart.FooterParts)
            {

                foreach(var field in footer.RootElement.Descendants<FieldCode>())
                {

                    int fieldNameStart = field.Text.LastIndexOf(FieldDelimeter, System.StringComparison.Ordinal);

                    if (fieldNameStart >= 0)
                    {
                        var fieldName = field.Text.Substring(fieldNameStart + FieldDelimeter.Length).Trim();

                        Run xxxfield = (Run)field.Parent;

                        Run rBegin = xxxfield.PreviousSibling<Run>();
                        Run rSep = xxxfield.NextSibling<Run>();
                        Run rText = rSep.NextSibling<Run>();
                        Run rEnd = rText.NextSibling<Run>();

                        if (null != xxxfield)
                        {

                            Text t = rText.GetFirstChild<Text>();
                            t.Text = replacementText;

                        }
                    }

                }


            }
        }
 } 

Solution 2:[2]

To get the document, headers and footers FieldCode and then replace the text, you should do this in 2 steps:

  • Get all the merge fields
  • Execute the replace text for each fields

Here is how you can get ALL FieldCode:

public IEnumerable<FieldCode> GetMergeFields(WordprocessingDocument doc)
{
    var mergeFields = new List<FieldCode>();

    if (doc == null) return mergeFields;

    mergeFields.AddRange(doc.MainDocumentPart.RootElement.Descendants<FieldCode>());

    foreach (var header in doc.MainDocumentPart.HeaderParts)
    {
        mergeFields.AddRange(header.RootElement.Descendants<FieldCode>());
    }

    foreach (var footer in doc.MainDocumentPart.FooterParts)
    {
        mergeFields.AddRange(footer.RootElement.Descendants<FieldCode>());
    }

    return mergeFields;
}

Solution 3:[3]

The documentation on doing this in general is terrible and the answers around the web are incomplete so I'm posting this here. In the end I used code from both of the older answers posted on this question and I had the added requirement of needing to also sub-in images that were base64 encoded.

The use-case I was dealing with specifically were custom document properties, and I had cases with weirdness like when it was removing the set of runs that make up the FieldCode. The code answered above was inadequate because there were still runs left-over. I've fixed that in this case by specifically checking for the end field character.

It uses SkiaSharp for the image loading from a base64 data URL and I'm setting some caps on the image size but if you don't need images it can be removed fairly easily.

There are a couple extra classes and functions I'm using but it's all things like retrieving the replacement values and reporting the finished result that are usage-specific.

As a reference for the EMU sizing this was the best reference I found: http://polymathprogrammer.com/2009/10/22/english-metric-units-and-open-xml/

The following is my solution:

using System.Text.RegularExpressions;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
using SkiaSharp;
using A = DocumentFormat.OpenXml.Drawing;
using DW = DocumentFormat.OpenXml.Drawing.Wordprocessing;
using PIC = DocumentFormat.OpenXml.Drawing.Pictures;


/// <inheritdoc />
public class OpenXmlDocumentFieldProcessor : BaseDocumentFieldProcessor
{
    /// <inheritdoc />
    public OpenXmlDocumentFieldProcessor(DataHelper dataHelper) : base(dataHelper) { }

    /// <inheritdoc />
    public override UpdateTemplateViewModel ProcessDocumentFields(byte[] file, DocumentTemplate entity)
    {
        string tempFile = Path.GetTempFileName();
        UpdateTemplateViewModel result = ApplyToTempFile(file, entity, tempFile);
        if (result.Statuses == UpdateTemplateStatuses.Success)
        {
            result.StampedFile = File.ReadAllBytes(tempFile);
        }

        return result;
    }

    /// <summary>
    ///     Applies the field processing to a temp file.
    /// </summary>
    /// <param name="file">The file.</param>
    /// <param name="entity">The entity.</param>
    /// <param name="tempFile">The temp file.</param>
    /// <returns>An UpdateTemplateViewModel.</returns>
    private UpdateTemplateViewModel ApplyToTempFile(byte[] file, DocumentTemplate entity, string tempFile)
    {
        File.WriteAllBytes(tempFile, file);
        using WordprocessingDocument doc = WordprocessingDocument.Open(tempFile, true);
        MainDocumentPart mainPart = doc.MainDocumentPart ?? throw new ArgumentException("Invalid document");

        var mergeFields = GetMergeFields(mainPart);
        var customPropertyNames = mergeFields.Select(e => Regex.Replace(e.InnerText.Trim(), "DOCPROPERTY  (.+)  \\\\[*] MERGEFORMAT", "$1"))
            .Distinct()
            .Where(i => i.StartsWith('_'))
            .ToList();

        UpdateTemplateViewModel result = ValidateCustomProperties(customPropertyNames);
        result.ClientName = null;
        if (result.InvalidCustomProperties.Count > 0)
        {
            result.Statuses = UpdateTemplateStatuses.Failure;

            return result;
        }

        var propertyMaps = GetDataForTemplate(customPropertyNames, entity);
        propertyMaps.Add(new TemplatePropertyMap
            { IsImage = false, DocumentCustomPropertyName = "_{PreparedDate}", FieldValue = DateTime.Today.ToShortDateString() });

        ReplaceMergeFields(mainPart, mergeFields, propertyMaps);

        doc.Save();

        result.Filename = entity.Filename;
        result.Statuses = UpdateTemplateStatuses.Success;

        return result;
    }

    /// <summary>
    ///     Replaces the merge fields.
    /// </summary>
    /// <param name="mainPart">The main part.</param>
    /// <param name="mergeFields">The merge fields.</param>
    /// <param name="fields">The fields.</param>
    private static void ReplaceMergeFields(MainDocumentPart mainPart, IEnumerable<FieldCode> mergeFields,
        IEnumerable<TemplatePropertyMap> fields)
    {
        var map = fields.ToDictionary(e => e.DocumentCustomPropertyName);
        foreach (FieldCode field in mergeFields)
        {
            string fieldName = Regex.Replace(field.InnerText.Trim(), "DOCPROPERTY  (.+)  \\\\[*] MERGEFORMAT", "$1");
            Run target = RemoveFieldCodeOverhead(field);
            if (!map.ContainsKey(fieldName))
            {
                continue;
            }

            TemplatePropertyMap targetProperty = map[fieldName];
            if (targetProperty.IsImage)
            {
                ReplaceMergeFieldWithImage(mainPart, target, targetProperty);
            }
            else
            {
                ReplaceMergeFieldWithText(target, targetProperty);
            }
        }
    }

    /// <summary>
    ///     Replaces the merge field with text.
    /// </summary>
    /// <param name="target">The target.</param>
    /// <param name="targetProperty">The target property.</param>
    private static void ReplaceMergeFieldWithText(OpenXmlElement target, TemplatePropertyMap targetProperty)
    {
        target.Append(new Text(targetProperty.FieldValue));
    }

    /// <summary>
    ///     Replaces the merge field with a base64-encoded image.
    /// </summary>
    /// <param name="mainPart">The main part.</param>
    /// <param name="target">The target.</param>
    /// <param name="targetProperty">The target property.</param>
    private static void ReplaceMergeFieldWithImage(MainDocumentPart mainPart, OpenXmlElement target, TemplatePropertyMap targetProperty)
    {
        // Assumption: base64-encoded DataURl. We just need the base64. SkiaSharp will handle the rest.
        string[] dataParts = targetProperty.FieldValue.Split(',');
        string encodedData = dataParts.Last();
        byte[] data = Convert.FromBase64String(encodedData);
        SKImage originalImage = SKImage.FromEncodedData(data);

        // Choosing 72 DPI and a 6in * 2in max size so supplied images don't disrupt templating. Works for POC. TODO: Turn into configuration value.
        const int dpi = 72;
        const long widthLimit = dpi * 6;
        const long heightLimit = dpi * 2;

        int resizedWidth = originalImage.Width;
        int resizedHeight = originalImage.Height;
        if (heightLimit < originalImage.Height || widthLimit < originalImage.Width)
        {
            float scaleHeight = heightLimit / (float)originalImage.Height;
            float scaleWidth = widthLimit / (float)originalImage.Width;
            float scale = Math.Min(scaleHeight, scaleWidth);

            resizedWidth = (int)(originalImage.Width * scale);
            resizedHeight = (int)(originalImage.Height * scale);
        }

        // Best reference for this EMU sizing: http://polymathprogrammer.com/2009/10/22/english-metric-units-and-open-xml/
        long cx = resizedWidth * (long)((float)914400 / dpi);
        long cy = resizedHeight * (long)((float)914400 / dpi);

        ImagePart imagePart = mainPart.AddImagePart(ImagePartType.Png);
        imagePart.FeedData(originalImage.Encode(SKEncodedImageFormat.Png, 100).AsStream());

        target.Append(CreateDrawingElement(mainPart.GetIdOfPart(imagePart), targetProperty.DocumentCustomPropertyName, cx, cy));
    }

    /// <summary>
    ///     Removes the field code overhead.
    /// </summary>
    /// <param name="field">The field.</param>
    /// <returns>A Run.</returns>
    private static Run RemoveFieldCodeOverhead(OpenXmlElement field)
    {
        OpenXmlElement container = field.Parent?.Parent ?? throw new ArgumentException("Error resolving field replacement container");
        container.RemoveAllChildren<ProofError>();

        Run rFldParent = (Run)field.Parent;
        var runs = new List<Run>
        {
            rFldParent.PreviousSibling<Run>(), // begin
            rFldParent.NextSibling<Run>(),
        };

        // We're deleting until we hit the end delimiter for the Field.
        do
        {
            runs.Add(runs.Last().NextSibling<Run>());
        } while (runs.Last().ChildElements.OfType<FieldChar>().All(e => e.FieldCharType != FieldCharValues.End));

        foreach (Run run in runs)
        {
            run.Remove();
        }

        rFldParent.RemoveAllChildren();

        return rFldParent;
    }

    /// <summary>
    ///     Gets the merge fields.
    /// </summary>
    /// <param name="mainPart">The main part.</param>
    /// <returns>A read only collection of FieldCodes.</returns>
    private static IReadOnlyCollection<FieldCode> GetMergeFields(MainDocumentPart mainPart)
    {
        var mergeFields = new List<FieldCode>();
        if (mainPart == null)
        {
            return mergeFields;
        }

        mergeFields.AddRange(mainPart.RootElement?.Descendants<FieldCode>() ?? new List<FieldCode>());
        foreach (HeaderPart header in mainPart.HeaderParts)
        {
            mergeFields.AddRange(header.RootElement?.Descendants<FieldCode>() ?? new List<FieldCode>());
        }

        foreach (FooterPart footer in mainPart.FooterParts)
        {
            mergeFields.AddRange(footer.RootElement?.Descendants<FieldCode>() ?? new List<FieldCode>());
        }

        return mergeFields;
    }

    /// <summary>
    ///     Creates the Drawing element to house the supplied image by id.
    /// </summary>
    /// <param name="imagePartId">The id to the corresponding ImagePart within an OpenXML document</param>
    /// <param name="name">The name of the image element</param>
    /// <param name="cx">The width extent of the drawing in EMU's</param>
    /// <param name="cy">The height extent of the drawing in EMU's</param>
    /// <returns></returns>
    private static Drawing CreateDrawingElement(string imagePartId, string name, long cx, long cy) => new(
        new DW.Inline(new DW.Extent { Cx = cx, Cy = cy },
            new DW.EffectExtent { LeftEdge = 0L, TopEdge = 0L, RightEdge = 0L, BottomEdge = 0L },
            new DW.DocProperties { Id = 1U, Name = name },
            new DW.NonVisualGraphicFrameDrawingProperties(new A.GraphicFrameLocks { NoChangeAspect = true }),
            new A.Graphic(new A.GraphicData(new PIC.Picture(
                    new PIC.NonVisualPictureProperties(new PIC.NonVisualDrawingProperties { Id = 0U, Name = name },
                        new PIC.NonVisualPictureDrawingProperties()),
                    new PIC.BlipFill(
                        new A.Blip(new A.BlipExtensionList(new A.BlipExtension { Uri = "{28A0092B-C50C-407E-A947-70E740481C1C}" }))
                            { Embed = imagePartId, CompressionState = A.BlipCompressionValues.HighQualityPrint },
                        new A.Stretch(new A.FillRectangle())),
                    new PIC.ShapeProperties(new A.Transform2D(new A.Offset { X = 0L, Y = 0L }, new A.Extents { Cx = cx, Cy = cy }),
                        new A.PresetGeometry(new A.AdjustValueList()) { Preset = A.ShapeTypeValues.Rectangle })))
                { Uri = "http://schemas.openxmlformats.org/drawingml/2006/picture" }))
        {
            DistanceFromTop = 0U, DistanceFromBottom = 0U, DistanceFromLeft = 0U, DistanceFromRight = 0U, EditId = "50D07946",
        });
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Paul Roub
Solution 2 SteveL
Solution 3 McAden