'generate field-value frequency count with jq
I can query all the unique values from a JSON field like so:
$ cat all.json | jq '.complianceState' | sort | uniq
"compliant"
"configManager"
"inGracePeriod"
"noncompliant"
"unknown"
And I can pedantically query the frequency count of each of these unique field values like so:
$ cat all.json | jq '.complianceState' | grep '^"configManager"$' | wc -l
116
Is there a way within jq to do this all in one shot to produce output like this:
{
"compliant" : 123000,
"noncompliant" : 2000,
"configManager" : 116
}
Solution 1:[1]
This is the solution that I uses, which a custom frequency function, that:
- buckets/bins an array of JSON values/objects by a JQ expression (the bucket
key) - provides the bucket
count(frequency) - provides the
percentageof items for each bucket (rounded to 2 decimal places) - provides the original
itemswhich were binned into the bucket, and - sorts the buckets by their
countin descending order.
def freq(expr):
length as $total_count
| group_by(expr)
| map({
key: (.[0] | expr),
count: length,
percent: (((length / $total_count * 10000 + 0.5) | floor) / 100),
items: .
})
| sort_by(-.count)
;
For instance, having defined the above in my $HOME/.jq, the query:
jq -n '
[
{"complianceState": "a", "other": 0.5},
{"complianceState": "b", "other": 1.2},
{"complianceState": "a", "other": 1.7},
{"complianceState": "c", "other": 5.3},
{"complianceState": "b", "other": 1.5},
{"complianceState": "e", "other": 0.6},
{"complianceState": "c", "other": 3.4},
{"complianceState": "c", "other": 5.9}
] | freq(.complianceState)'
Would produce
[
{
"key": "c",
"count": 3,
"percent": 37.5,
"items": [
{"complianceState": "c", "other": 5.3},
{"complianceState": "c", "other": 3.4},
{"complianceState": "c", "other": 5.9}
]
},
{
"key": "a",
"count": 2,
"percent": 25,
"items": [
{"complianceState": "a", "other": 0.5},
{"complianceState": "a", "other": 1.7}
]
},
{
"key": "b",
"count": 2,
"percent": 25,
"items": [
{"complianceState": "b", "other": 1.2},
{"complianceState": "b", "other": 1.5}
]
},
{
"key": "e",
"count": 1,
"percent": 12.5,
"items": [
{"complianceState": "e", "other": 0.6}
]
}
]
For your case, you would need to slurp up the input into a JSON array with -s. From there, you can transform the output into the desired format. E.g.
jq -s 'freq(.complianceState)
| map({key, value: .count})
| from_entries
' all.json
Note that with the freq function you can group by an arbitrary expression. For example freq((.other / 1.5) | floor) if you wish to get histogram-like binning.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
