'Traverse a JSON file and flatten it along the way with slashes in C#

I'm new to JSON and I'm trying to flatten a large JSON file with slashes separating each level. At the basis, I'm trying to have the value either be a single int or string, or list of strings (separated by a comma). For example, if the JSON looked like this:

{
  "This": {
    "Is": {
      "A": [
        "json",
        "file"
      ],
      "Another": {
        "JSON": "File"
      }
    },
    "File": {
      "Is": 4,
      "Line": [
        {
          "Nested": "JSON"
        },
        {
          "Nested2:": "JSON2"
        }
      ]
    }
  }
}

It would flatten out to:

This / Is / A : json, file

This / Is / Another / JSON : file

This / File / Is : 4

This / File / Line / Nested : JSON

This / File / Line / Nested 2 : JSON2

I saw a Javascript solution on StackOverflow that said to do something like this:

function traverse(jsonObj) {
    if( jsonObj !== null && typeof jsonObj == "object" ) {
        Object.entries(jsonObj).forEach(([key, value]) => {
            // key is either an array index or object key
            traverse(value);
        });
    }
    else {
        // jsonObj is a number or string
    }
}

But I'm confused, how would I be able to add slashes between the keys as I traverse?
Also note that I am looking for a C# solution, not Javascript.


Ok so I tried to take a crack at this. I am using the Newtonsoft.Linq library. My JSON file has values that are ints, strings, lists, objects, and lists of objects. I wanted to save all the flattened responses to a dictionary (instead of writing it to console). But I run into the issue where the key already exists in the dictionary. This is in the case where there is a list of objects and the list of objects key is the same because they follow a pattern block, but the value is different. Example:

{
  "Level1": [
    {
      "Level2": {
        "A": "something1",
        "B": 1
      },
      "Level3": [
        {
          "3A": "content"
        }
      ],
      "Level41": null,
      "Level42": null,
      "Level43": null,
      "Level44": "content",
      "Level45": "content",
      "Level46": "content"
    },
    {
      "Level2": {
        "A": "something1",
        "B": 2
      },
      "Level3": [
        {
          "3A": "morecontent"
        }
      ],
      "Level41": null,
      "Level42": null,
      "Level43": null,
      "Level44": "content",
      "Level45": "morecontent",
      "Level46": "content"
    }
  ]
}

Is there a way to ensure that if the dictionary has already seen the key, it saves all values of that key in a list? Or is that not my issue here? Here is what I did:

// The JSON file has been deserialized into a Dictionary with string key and object value
// We will be saving all the key value pairs we have seen into a Dictionary called dictionary
Dictionary<string, object> dictionary = new Dictionary<string,object>();
// I was using a datgridview to input the values in a cell but we can ignore that

private void AddValue(string key, object value)
{
    if (value != null && value.GetType() == typeof(JObject)) // if key == string, value == object
    {
        foreach (var a in (JObject)value)
        {
            string name = a.Key;
            JToken obj = a.Value;
            AddValue(name + a + " / ", obj);
        }
    }
    else if (value != null && value.GetType() == typeof(JArray)) // if key == string, value == array
    {
        foreach (var a in (JArray)value)
        {
            if (a.GetType() == typeof(JObject)) // if key == string, value == object
            {
                foreach (var item in (JObject)a)
                {
                    string objectName = item.Key;
                    JToken objValue = item.Value;
                    AddValue(objectName + item + " / ", objValue);
                }
            }
            else //key == string, value == string
            {
                string name = (string)a;
                AddValue(name + " / ", a);
            }
        }
    }
    else // key == string, value == string
    {
    var keyCell = new DataGridViewTextBoxCell();
    var row = new DataGridViewRow();
    var valCell = new DataGridViewTextBoxCell();
    if (value != null)
    {
        keyCell.Value = key;
        valCell.Value = value;
        row.Cells.Add(keyCell);
        row.Cells.Add(valCell);
        this.treeView.Rows.Add(row);
        if (dictionary.ContainsKey((string)keyCell.Value)) // this is if 
        {
            string optionKey = (string)keyCell.Value;
            object keyValueObject = dictionary[(string)keyCell.Value];
            //values.Add((string)valCell.Value);
            dictionary.Remove((string)keyCell.Value);
        }
        dictionary.Add((string)keyCell.Value, null); // I know this isn't right, because then this just overrides the key each time. Not sure how to fix
        else
        {
            keyCell.Value = key;
            valCell.Value = String.Empty;
            row.Cells.Add(keyCell);
            row.Cells.Add(valCell);
            this.datagridview.Rows.Add(row);
            if (dictionary.ContainsKey((string)keyCell.Value))
            {
                values.Add((string)valCell.Value);
                dictionary.Remove((string)keyCell.Value);
            }
            dictionary.Add((string)keyCell.Value, values);
        }
    }
}


Solution 1:[1]

Using the Json.NET LINQ-to-JSON API (JObjects), you can flatten your JSON to a Dictionary<string, List<string>> like this:

JObject jo = JObject.Parse(json);

Dictionary<string, List<string>> dict = 
    jo.Descendants()
      .OfType<JProperty>()
      .Where(jp => jp.Value is JValue || 
                   (jp.Value is JArray && jp.Value.Children().All(jt => jt is JValue)))
      .Select(jp => new 
      {
          Path = Regex.Replace(jp.Path, @"\[\d+\]", "").Replace(".", " / "),
          Value = jp.Value is JValue 
              ? (string)jp.Value 
              : string.Join(", ", jp.Value.Children().Select(jt => (string)jt))
      })
      .GroupBy(a => a.Path)
      .ToDictionary(g => g.Key, g => g.Select(a => a.Value).ToList());

This code is a bit dense, so here's a step-by-step breakdown of how it works:

  1. Parse the JSON into a JObject so we can query against it:

     JObject jo = JObject.Parse(json);
    
  2. Create a Dictionary<string, List<string>> from the JObject as follows:

     Dictionary<string, List<string>> dict =
    
  3. Get all descendants of the JObject in depth-first order, which will include JObjects, JArrays, JProperties (key-value pairs within JObjects) and JValues (primitives):

      jo.Descendants()
    
  4. Filter the descendants down to just the properties (the key-value pairs):

        .OfType<JProperty>()
    
  5. We are only interested in properties having a value that is either a primitive (e.g. string, int) or an array containing only primitives. So next we filter the properties to only those meeting that criteria:

        .Where(jp => jp.Value is JValue || 
                     (jp.Value is JArray && 
                      jp.Value.Children().All(jt => jt is JValue)))
    
  6. Now that we have the properties we want, we need to extract the path and value (s) from each one per the requirements. We will select these into a temporary anonymous object to make it easier to do further processing on them later.

        .Select(jp => new 
        {
    

    To do the paths with slashes we can make use of the handy Path property on the JProperty. This will return a dot-separated path, such as This.File.Line[0].Nested, which is pretty close to what you are looking for. However, we need to get rid of the array subscripts (e.g. [0]) and replace the dots with slashes:

            Path = Regex.Replace(jp.Path, @"\[\d+\]", "").Replace(".", " / "),
    

    For the value part, what we do depends on whether the value is a primitive or an array of primitives. If it's just a simple primitive, we just get its value as a string; if it's an array, we join all of its children (as strings) together into a comma-delimited list:

            Value = jp.Value is JValue 
                ? (string)jp.Value 
                : string.Join(", ", jp.Value.Children().Select(jt => (string)jt))
         }
    
  7. Since we eliminated the array subscripts there is the possibility of duplicate keys now. We want to be able to put everything into a dictionary, so we need to group together the values having the same paths:

         .GroupBy(a => a.Path)
    
  8. Once we have the groups we can convert everything to a dictionary by putting the values into a list for each key (path):

         .ToDictionary(g => g.Key, g => g.Select(a => a.Value).ToList())
    

Working demo here: https://dotnetfiddle.net/2VeAhY

Solution 2:[2]

JToken has a useful Path property. With its help this short code:

var text = File.ReadAllText("test.json");
var json = JObject.Parse(text);

var result = json.Descendants()
    .Where(t => !t.HasValues)
    .Select(t => t.Path.Replace(".", " / ") + " : " + t.ToString());

foreach (var item in result)
    Console.WriteLine(item);

gives the following result:

This / Is / A[0] : json
This / Is / A[1] : file
This / Is / Another / JSON : File
This / File / Is : 4
This / File / Line[0] / Nested : JSON
This / File / Line[1] / Nested2: : JSON2

To be continued...

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Alexander Petrov