'Use jq to parse key path for each each leaf

(I’m not sure the technical terms to use but can update the question if someone can clarify the terminology I’m lacking for what I'm trying to do. It might help someone find this answer in the future.)

Given the input JSON, how would I use jq to produce the expected output?

Input:

{
    "items": {
        "item1": {
            "part1": {
                "a": {
                    "key1": "value",
                    "key2": "value"
                },
                "b": {
                    "key1": "value",
                    "key2": "value"
                }
            },
            "part2": {
                "c": {
                    "key1": "value",
                    "key2": "value"
                },
                "d": {
                    "key1": "value",
                    "key2": "value"
                }
            }
        },
        "item2": {
            "part3": {
                "e": {
                    "key1": "value",
                    "key2": "value"
                },
                "f": {
                    "key1": "value",
                    "key2": "value"
                }
            },
            "part4": {
                "g": {
                    "key1": "value",
                    "key2": "value"
                },
                "h": {
                    "key1": "value",
                    "key2": "value"
                }
            }
        }
    }
}

Expected output:

{
    "item1": [
        "part1.a",
        "part1.b",
        "part2.c",
        "part2.d"
    ]
    "item2": [
        "part3.e",
        "part3.f"
        "part4.g",
        "part4.h"
    ]
}


Solution 1:[1]

Try this:

.items | map_values([path(.[][]) | join(".")])

Online demo

Each output path will contain as many path components as the number of []s in the .[][] part; in other words, if you change .[][] to .[][][], for example, you'll see part1.a.key1, part1.a.key2, etc.

Solution 2:[2]

This would do it:

# Output: a stream
def keyKey:
  keys_unsorted[] as $k | $k + "." + (.[$k] | keys_unsorted[]);

.items | map_values([keyKey])

Solution 3:[3]

Some aspects are underspecified. For instance, you don't specify how deep the aggregation should go for the array items. Is it always two levels deep, or is it the whole tree but the last level?

Here's one way how you would go two levels deep with the keys sorted alphabetically:

jq '.items | .[] |= [keys[] as $k | $k + "." + (.[$k] | keys[])]'

Demo

Here's another way how to go down until the second-to-last level:

jq '.items | .[] |= ([path(.. | scalars)[:-1] | join(".")] | unique)'

Demo

Output:

{
  "item1": [
    "part1.a",
    "part1.b",
    "part2.c",
    "part2.d"
  ],
  "item2": [
    "part3.e",
    "part3.f",
    "part4.g",
    "part4.h"
  ]
}

Solution 4:[4]

the unique sequence of jq paths of 'keys' to each and every leaf is returned from json2jqpath.jq

 json2jqpath.jq dat.json 
.
.items
.items|.item1
.items|.item1|.part1
.items|.item1|.part1|.a
.items|.item1|.part1|.a|.key1
.items|.item1|.part1|.a|.key2
.items|.item1|.part1|.b
.items|.item1|.part1|.b|.key1
.items|.item1|.part1|.b|.key2
.items|.item1|.part2
.items|.item1|.part2|.c
.items|.item1|.part2|.c|.key1
.items|.item1|.part2|.c|.key2
.items|.item1|.part2|.d
.items|.item1|.part2|.d|.key1
.items|.item1|.part2|.d|.key2
.items|.item2
.items|.item2|.part3
.items|.item2|.part3|.e
.items|.item2|.part3|.e|.key1
.items|.item2|.part3|.e|.key2
.items|.item2|.part3|.f
.items|.item2|.part3|.f|.key1
.items|.item2|.part3|.f|.key2
.items|.item2|.part4
.items|.item2|.part4|.g
.items|.item2|.part4|.g|.key1
.items|.item2|.part4|.g|.key2
.items|.item2|.part4|.h
.items|.item2|.part4|.h|.key1
.items|.item2|.part4|.h|.key2

It is not the output you asked for but as another noted, your question may be somewhat under specified. starting from a preprocessed structure such as this has the advantage of reducing every json file to its set of paths to start fiddling with.

json2jqpath

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 oguz ismail
Solution 2 peak
Solution 3
Solution 4 tomc