'How combine two channels by tuple elements in different positions?

I have to combine the output of two different channels like these:

first_output = Channel.from(['H100_BDNA', 'sed'], ['H100_nova', 'rip'], ['H100_hiseq', 'bam2'])
second_output= Channel.from(['pAdna', 'H100_hiseq', '11'], ['pAsc', 'H100_BDNA', '45'], ['iMes', 'H100_BDNA', '58'], ['pAsc1', 'H100_nova', '23'])

The wanted result should be:

['pAdna', 'H100_hiseq', '11', 'bam2'], 
['pAsc', 'H100_BDNA', '45', 'sed'], 
['iMes', 'H100_BDNA', '58', 'sed'], 
['pAsc1', 'H100_nova', '23', 'rip']

That means joining the Channels by a common key represented by the first element of the tuple in the first output and the second element in the second channel. I tried a lot of operators but no-one works. How can I do it?



Solution 1:[1]

I think what you want is to decorate-combine-undecorate:

second_output
    .map { tuple( it[1], *it ) }
    .combine( first_output, by: 0 )
    .map { it[1..-1] }
    .view()

Results:

[pAdna, H100_hiseq, 11, bam2]
[pAsc, H100_BDNA, 45, sed]
[iMes, H100_BDNA, 58, sed]
[pAsc1, H100_nova, 23, rip]

This works by prefixing the elements in the second channel with the key to be used to then combine the items in the first channel. Note that we use the 'second version' of the combine operator to combine only those items that share a common key. Finally, we 'remove' the shared key by selecting all elements except the first element. See also: Schwartzian transform

Solution 2:[2]

as far as I know, in nextflow you can't specify the join-key position if it's different for two channels (by applies to both channels). The way I usually deal with it, is that I first rearrange the tuples in both channels in a way, that they can be joined using map and swap.

For your example (join first_output, key = 0 with second_output key = 1 and then reorder to have the desired output order) the approach would look like this:

second_output
    .map{it.swap(1,0)} // swap item 0 and 1
    .join(first_output) // now join on item 0
    .map{it.swap(1,0)} // swap back
    .set { joined_output }

edit: I just realized, that you have 3 tuples in first and 4 tuples in second output. The behaviour of nextflow regarding joinis indeed unintuitive as stated in this discussion

They also provide a workaround function "inner_join":

def inner_join(ch_a, ch_b) {
    return ch_b.cross(ch_a).map { [it[0][0], *it[1][1..-1], *it[0][1..-1]] }
}

Using this function your solution would be (still swapping positions):

inner_join(second_output.map{ it.swap(1,0) }, first_output)
    .map{ it.swap(1,0) }
    .set { joined_output }

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Steve
Solution 2