'generate field-value frequency count with jq

I can query all the unique values from a JSON field like so:

$ cat all.json | jq '.complianceState' | sort | uniq

"compliant"
"configManager"
"inGracePeriod"
"noncompliant"
"unknown"

And I can pedantically query the frequency count of each of these unique field values like so:

$ cat all.json | jq '.complianceState' | grep '^"configManager"$' | wc -l

116

Is there a way within jq to do this all in one shot to produce output like this:

{
    "compliant" : 123000,
    "noncompliant" : 2000,
    "configManager" : 116
}


Solution 1:[1]

This is the solution that I uses, which a custom frequency function, that:

  • buckets/bins an array of JSON values/objects by a JQ expression (the bucket key)
  • provides the bucket count (frequency)
  • provides the percentage of items for each bucket (rounded to 2 decimal places)
  • provides the original items which were binned into the bucket, and
  • sorts the buckets by their count in descending order.
def freq(expr):
  length as $total_count
    | group_by(expr)
    | map({
        key: (.[0] | expr),
        count: length,
        percent: (((length / $total_count * 10000 + 0.5) | floor) / 100),
        items: .
      })
    | sort_by(-.count)
  ;

For instance, having defined the above in my $HOME/.jq, the query:

jq -n '
[                                                                                                                                                                                               
  {"complianceState": "a", "other": 0.5},
  {"complianceState": "b", "other": 1.2},
  {"complianceState": "a", "other": 1.7},
  {"complianceState": "c", "other": 5.3},
  {"complianceState": "b", "other": 1.5},
  {"complianceState": "e", "other": 0.6},
  {"complianceState": "c", "other": 3.4},
  {"complianceState": "c", "other": 5.9}
] | freq(.complianceState)'

Would produce

[
  {
    "key": "c",
    "count": 3,
    "percent": 37.5,
    "items": [
      {"complianceState": "c", "other": 5.3},
      {"complianceState": "c", "other": 3.4},
      {"complianceState": "c", "other": 5.9}
    ]
  },
  {
    "key": "a",
    "count": 2,
    "percent": 25,
    "items": [
      {"complianceState": "a", "other": 0.5},
      {"complianceState": "a", "other": 1.7}
    ]
  },
  {
    "key": "b",
    "count": 2,
    "percent": 25,
    "items": [
      {"complianceState": "b", "other": 1.2},
      {"complianceState": "b", "other": 1.5}
    ]
  },
  {
    "key": "e",
    "count": 1,
    "percent": 12.5,
    "items": [
      {"complianceState": "e", "other": 0.6}
    ]
  }
]

For your case, you would need to slurp up the input into a JSON array with -s. From there, you can transform the output into the desired format. E.g.

jq -s 'freq(.complianceState)
  | map({key, value: .count})
  | from_entries
' all.json

Note that with the freq function you can group by an arbitrary expression. For example freq((.other / 1.5) | floor) if you wish to get histogram-like binning.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1