'Traverse a JSON file and flatten it along the way with slashes in C#
I'm new to JSON and I'm trying to flatten a large JSON file with slashes separating each level. At the basis, I'm trying to have the value either be a single int or string, or list of strings (separated by a comma). For example, if the JSON looked like this:
{
"This": {
"Is": {
"A": [
"json",
"file"
],
"Another": {
"JSON": "File"
}
},
"File": {
"Is": 4,
"Line": [
{
"Nested": "JSON"
},
{
"Nested2:": "JSON2"
}
]
}
}
}
It would flatten out to:
This / Is / A : json, file
This / Is / Another / JSON : file
This / File / Is : 4
This / File / Line / Nested : JSON
This / File / Line / Nested 2 : JSON2
I saw a Javascript solution on StackOverflow that said to do something like this:
function traverse(jsonObj) {
if( jsonObj !== null && typeof jsonObj == "object" ) {
Object.entries(jsonObj).forEach(([key, value]) => {
// key is either an array index or object key
traverse(value);
});
}
else {
// jsonObj is a number or string
}
}
But I'm confused, how would I be able to add slashes between the keys as I traverse?
Also note that I am looking for a C# solution, not Javascript.
Ok so I tried to take a crack at this. I am using the Newtonsoft.Linq library. My JSON file has values that are ints, strings, lists, objects, and lists of objects. I wanted to save all the flattened responses to a dictionary (instead of writing it to console). But I run into the issue where the key already exists in the dictionary. This is in the case where there is a list of objects and the list of objects key is the same because they follow a pattern block, but the value is different. Example:
{
"Level1": [
{
"Level2": {
"A": "something1",
"B": 1
},
"Level3": [
{
"3A": "content"
}
],
"Level41": null,
"Level42": null,
"Level43": null,
"Level44": "content",
"Level45": "content",
"Level46": "content"
},
{
"Level2": {
"A": "something1",
"B": 2
},
"Level3": [
{
"3A": "morecontent"
}
],
"Level41": null,
"Level42": null,
"Level43": null,
"Level44": "content",
"Level45": "morecontent",
"Level46": "content"
}
]
}
Is there a way to ensure that if the dictionary has already seen the key, it saves all values of that key in a list? Or is that not my issue here? Here is what I did:
// The JSON file has been deserialized into a Dictionary with string key and object value
// We will be saving all the key value pairs we have seen into a Dictionary called dictionary
Dictionary<string, object> dictionary = new Dictionary<string,object>();
// I was using a datgridview to input the values in a cell but we can ignore that
private void AddValue(string key, object value)
{
if (value != null && value.GetType() == typeof(JObject)) // if key == string, value == object
{
foreach (var a in (JObject)value)
{
string name = a.Key;
JToken obj = a.Value;
AddValue(name + a + " / ", obj);
}
}
else if (value != null && value.GetType() == typeof(JArray)) // if key == string, value == array
{
foreach (var a in (JArray)value)
{
if (a.GetType() == typeof(JObject)) // if key == string, value == object
{
foreach (var item in (JObject)a)
{
string objectName = item.Key;
JToken objValue = item.Value;
AddValue(objectName + item + " / ", objValue);
}
}
else //key == string, value == string
{
string name = (string)a;
AddValue(name + " / ", a);
}
}
}
else // key == string, value == string
{
var keyCell = new DataGridViewTextBoxCell();
var row = new DataGridViewRow();
var valCell = new DataGridViewTextBoxCell();
if (value != null)
{
keyCell.Value = key;
valCell.Value = value;
row.Cells.Add(keyCell);
row.Cells.Add(valCell);
this.treeView.Rows.Add(row);
if (dictionary.ContainsKey((string)keyCell.Value)) // this is if
{
string optionKey = (string)keyCell.Value;
object keyValueObject = dictionary[(string)keyCell.Value];
//values.Add((string)valCell.Value);
dictionary.Remove((string)keyCell.Value);
}
dictionary.Add((string)keyCell.Value, null); // I know this isn't right, because then this just overrides the key each time. Not sure how to fix
else
{
keyCell.Value = key;
valCell.Value = String.Empty;
row.Cells.Add(keyCell);
row.Cells.Add(valCell);
this.datagridview.Rows.Add(row);
if (dictionary.ContainsKey((string)keyCell.Value))
{
values.Add((string)valCell.Value);
dictionary.Remove((string)keyCell.Value);
}
dictionary.Add((string)keyCell.Value, values);
}
}
}
Solution 1:[1]
Using the Json.NET LINQ-to-JSON API (JObjects), you can flatten your JSON to a Dictionary<string, List<string>> like this:
JObject jo = JObject.Parse(json);
Dictionary<string, List<string>> dict =
jo.Descendants()
.OfType<JProperty>()
.Where(jp => jp.Value is JValue ||
(jp.Value is JArray && jp.Value.Children().All(jt => jt is JValue)))
.Select(jp => new
{
Path = Regex.Replace(jp.Path, @"\[\d+\]", "").Replace(".", " / "),
Value = jp.Value is JValue
? (string)jp.Value
: string.Join(", ", jp.Value.Children().Select(jt => (string)jt))
})
.GroupBy(a => a.Path)
.ToDictionary(g => g.Key, g => g.Select(a => a.Value).ToList());
This code is a bit dense, so here's a step-by-step breakdown of how it works:
Parse the JSON into a JObject so we can query against it:
JObject jo = JObject.Parse(json);Create a
Dictionary<string, List<string>>from theJObjectas follows:Dictionary<string, List<string>> dict =Get all descendants of the JObject in depth-first order, which will include JObjects, JArrays, JProperties (key-value pairs within JObjects) and JValues (primitives):
jo.Descendants()Filter the descendants down to just the properties (the key-value pairs):
.OfType<JProperty>()We are only interested in properties having a value that is either a primitive (e.g. string, int) or an array containing only primitives. So next we filter the properties to only those meeting that criteria:
.Where(jp => jp.Value is JValue || (jp.Value is JArray && jp.Value.Children().All(jt => jt is JValue)))Now that we have the properties we want, we need to extract the path and value (s) from each one per the requirements. We will select these into a temporary anonymous object to make it easier to do further processing on them later.
.Select(jp => new {To do the paths with slashes we can make use of the handy
Pathproperty on theJProperty. This will return a dot-separated path, such asThis.File.Line[0].Nested, which is pretty close to what you are looking for. However, we need to get rid of the array subscripts (e.g.[0]) and replace the dots with slashes:Path = Regex.Replace(jp.Path, @"\[\d+\]", "").Replace(".", " / "),For the value part, what we do depends on whether the value is a primitive or an array of primitives. If it's just a simple primitive, we just get its value as a string; if it's an array, we join all of its children (as strings) together into a comma-delimited list:
Value = jp.Value is JValue ? (string)jp.Value : string.Join(", ", jp.Value.Children().Select(jt => (string)jt)) }Since we eliminated the array subscripts there is the possibility of duplicate keys now. We want to be able to put everything into a dictionary, so we need to group together the values having the same paths:
.GroupBy(a => a.Path)Once we have the groups we can convert everything to a dictionary by putting the values into a list for each key (path):
.ToDictionary(g => g.Key, g => g.Select(a => a.Value).ToList())
Working demo here: https://dotnetfiddle.net/2VeAhY
Solution 2:[2]
JToken has a useful Path property. With its help this short code:
var text = File.ReadAllText("test.json");
var json = JObject.Parse(text);
var result = json.Descendants()
.Where(t => !t.HasValues)
.Select(t => t.Path.Replace(".", " / ") + " : " + t.ToString());
foreach (var item in result)
Console.WriteLine(item);
gives the following result:
This / Is / A[0] : json
This / Is / A[1] : file
This / Is / Another / JSON : File
This / File / Is : 4
This / File / Line[0] / Nested : JSON
This / File / Line[1] / Nested2: : JSON2
To be continued...
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Alexander Petrov |
