'Filtering by comparing two streams one-on-one in jq

I have streams

{
    "key": "a",
    "value": 1
}
{
    "key": "b",
    "value": 1
}
{
    "key": "c",
    "value": 1
}
{
    "key": "d",
    "value": 1
}
{
    "key": "e",
    "value": 1
}

And

 (true,true,false,false,true)

I want to compare the two one-on-one and only print the object if the corresponding boolean is true.

So I want to output

{
    "key": "a",
    "value": 1
}
{
    "key": "b",
    "value": 1
}
{
    "key": "e",
    "value": 1
}

I tried (https://jqplay.org/s/GGTHEfQ9s3)

filter:
. as $input | foreach (true,true,false,false,true) as $dict ($input; select($dict))

input:
{
    "key": "a",
    "value": 1
}
{
    "key": "b",
    "value": 1
}
{
    "key": "c",
    "value": 1
}
{
    "key": "d",
    "value": 1
}
{
    "key": "e",
    "value": 1
}

But I get output:

{"key":"a","value":1}
{"key":"a","value":1}
null
{"key":"b","value":1}
{"key":"b","value":1}
null
{"key":"c","value":1}
{"key":"c","value":1}
null
{"key":"d","value":1}
{"key":"d","value":1}
null
{"key":"e","value":1}
{"key":"e","value":1}
null

Help will be appreciated.



Solution 1:[1]

One way would be to read in the streams as arrays, use transpose to match their items, and select by one and output the other:

jq -s '[.,[(true,true,false,false,true)]] | transpose[] | select(.[1])[0]' objects.json

Demo

Another approach would be to read in the streams as arrays, convert the booleans array into those indices where conditions match, and use them to reference into the objects array:

jq -s '.[[(true,true,false,false,true)] | indices(true)[]]' objects.json

Demo

The same approach but using nth to reference into the inputs stream requires more precaution, as the successive consumption of stream inputs demands the provision of relative distances, not absolute positions to nth. A conversion can be implemented by successively checking the position of the next true value using index and a while loop:

jq -n 'nth([true,true,false,false,true] | while(. != []; .[index(true) + 1:]) | index(true) | values; inputs)' objects.json

Demo

One could also use reduce to directly iterate over the boolean values, and just select any appropriate input:

jq -n 'reduce (true,true,false,false,true) as $dict ([]; . + [input | select($dict)]) | .[]' objects.json

Demo

A solution using foreach, like you intended, also would need the -n option to not miss the first item:

jq -n 'foreach (true,true,false,false,true) as $dict (null; input | select($dict))' objects.json

Demo

Solution 2:[2]

Unfortunately, each invocation of jq can currently handle at most one external JSON stream. This is not usually an issue unless both streams are very large, so in this answer I'll focus on a solution that scales. In fact, the amount of computer memory required is miniscule no matter how large the streams may be.

For simplicity, let's assume that:

  • demon.json is a file consisting of a stream of JSON boolean values (i.e., not comma-separated);
  • object.json is your stream of JSON objects;
  • the streams have the same length;
  • we are working in a bash or bash-like environment.

Then we could go with:

paste -d '\t' demon.json <(jq -c . objects.json) | jq -n '
  foreach inputs as $boolean (null; input; select($boolean))'

So apart from the startup costs of paste and jq, we basically only need enough memory to hold one of the objects in objects.json at a time. This solution is also very fast.

Of course, if objects.json were already in JSONL (JSON-lines) format, then the first call to jq above would not be necessary.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2